Create and edit images with Gemini 2.0 in preview

developers.googleblog.com

252 points by meetpateltech 3 days ago

vunderba 3 days ago

I've added/tested this multimodal Gemini 2.0 to my shoot-out of SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which contains a collection of increasingly difficult prompts.

https://genai-showdown.specr.net

I don't know how much of Google's original Imagen 3.0 is incorporated into this new model, but the overall aesthetic quality seems to be unfortunately significantly worse.

The big "wins" are:

- Multimodal aspect in trying to keep parity with OpenAI's offerings.

- An order of magnitude faster than OpenAI 4o image gen

ticulatedspline 2 days ago

Excellent site! OpenAI 4o is more than mildly frighting in it's capabilities to understand the prompt. Seems mostly what's holding it back is a tendency away from photo-realism (or even typical digital art styles) and it's own safeguards.
- echelon 2 days ago
  
  Multimodal is the only image generation modality that matters going forward. Flux, HiDream, Stable Diffusion, and the like are going to be relegated to the past once multimodal becomes more common. Text-to-image sucks, and image-to-image with all the ControlNets and Comfy nodes is cumbersome in comparison to true multimodal instructiveness.
  I hope that we get an open weights multimodal image gen model. I'm slightly concerned that if these things take tens to hundreds of millions of dollars to train, that only Google and OpenAI will provide them.
  That said, the one weakness in multimodal models is that they don't let you structure the outputs yet. Multimodal + ControlNets would fix that, and that would be like literally painting with the mind.
  The future, when these models are deeply refined and perfected, is going to be wild.
  - zaptrem 2 days ago
    
    Good chance a future llama will output image tokens
    
    echelon 2 days ago
    
    That's my hope. That Llama or Qwen bring multimodal image generation capabilities to open source so we're not left in the dark.
    If that happens, then I'm sure we'll see slimmer multimodal models over the course of the next year or so. And that teams like Black Forest Labs will make more focused and performant multimodal variants.
    We need the incredible instructivity of multimodality. That's without question. But we also need to be able to fine tune, use ControlNets to guide diffusion, and to compose these into workflows.
- troupo 2 days ago
  
  I also find it weird how it defaults/devolves into this overall brown-ish style. Once you see it, you see it everywhere
  - flir 2 days ago
    
    I've played around with "create an image based on this image" chains quite a lot, and yep, everything goes brown with 4o. You append the images to each other as a filmstrip and it's almost like a gradient.
    They also simplify over the generations (eg a basket full of stuff slowly loses the stuff), but I guess that's to be expected.
  - jlarcombe 2 days ago
    
    yes. it's absolutely horrible looking.
- avereveard 2 days ago
  
  It's a bit expensive/slow but for styled request I let it do the base image and when happy with the composition I ask to remake it as a picture or in whatever style needed.
saretup 2 days ago

What I found while using these models:
For generating a new image, GPT 4o image gen is the best.
For editing an existing image (while retaining parts of the original image) such as adding text or objects in the original image, Gemini 2.0 image gen model is the best (GPT 4o always changes the original image no matter what).
belter 3 days ago

Your shoot-out site is very useful. Could I suggest adding prompts that expose common failure modes?
For example, asking the models to show clocks set to a specific time or people drawing with their left hand. I think most, if not all models, will likely display every clock with the same time...And portray subjects drawing with their right hand.
- vunderba 3 days ago
  
  @belter / @crooked-v
  Thanks for the suggestions. Most of the current prompts are a result of personal images that I wanted to generate - so I'll try to add some "classic GenAI failure modes". Musical instruments such as pianos also used to be a pretty big failure point as well.
  - troupo 2 days ago
    
    For personal images I often play with wooly mammoths, and most models are incapable of generating anything but textbook images. Any deviation either becomes an elephant or an abomination (bull- or bear-like monsters)
- crooked-v 3 days ago
  
  Another I would suggest is buildings with specific unusual proportions and details(e.g. "the mansion's west wing is twice the height of the right wing and has only very wide windows"). I've yet to find a model that will do that kind of thing reliably, where it seems to just fall back on the vibes of whatever painting or book cover is vaguely similar to what's described.
  - droopyEyelids 3 days ago
    
    generating a simple maze for kids is also not possible yet
    
    vunderba 3 days ago
    
    Love this one so I've added it. The concept is very easy for most GenAI models to grasp, but it requires a strong overall cohesive understanding. Rather unbelievably, OpenAI 4o managed to produce a pass.
    I should also add an image that is heavy with "greebles". GenAI usually lacks the fidelity for these kinds of minor details so although it adds them - they tend to fall apart at more than a cursory examination.
    https://en.wikipedia.org/wiki/Greeble
esperent 2 days ago

Your site is really useful, thanks for sharing. One issue is that the list of examples sticks to the top and covers more than half of the screen on mobile, could you add a way to hide it?
If you're looking for other suggestions a summary table showing which models are ahead would be great.
- vunderba 2 days ago
  
  Great point - when I started building it I think I only had about four test cases, but now the nav bar is eating 50% of the vertical display so I've removed it from mobile display!
  Wrt to the summary table, did you have a different metric in mind? The top of the display should already be showing a "Model Performance" chart with OpenAI 4 and Google Imagen 3 leading the pack.
  - esperent 2 days ago
    
    That's much easy to read now.
    > The top of the display should already be showing a "Model Performance" chart
    I guess I missed this earlier!
pkulak 2 days ago

> That mermaid was quite the saucy tart.
Really now?
andybak 2 days ago

Any thoughts on how Ideogram would rank? I've not used it recently but I used to get the sense that it is (or was) a "contender".
liuliu 2 days ago

Do you mind to share which HiDream-I1 model you are using? I am getting better results with these prompts from mine implementation inside Draw Things.
- vunderba 2 days ago
  
  Sure - I was using "hidream-l1-dev" but if you're seeing better results - I might rerun the hidream tests with the "hidream-i1-full" model.
  I've been thinking about possibly rerunning the Flux Dev prompts using the 1.1 Pro but I liked having a base reference for images that can be generated on consumer hardware.
  - liuliu 2 days ago
    
    Yeah, I use the full model which is slightly better at some of these prompts.
croes 2 days ago

How about including the simple cases where AI usually fails like
"Draw a clock showing the time of 09:30 a.m."
ChatGPT still shows 01:50
Or
"Draw a painter painting a picture of the Eiffel Tower with his left hand"
Painter is still right handed.

simonw 2 days ago

Be a bit careful playing with this one. I tried this:

  curl -s -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent?key=$GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Provide a vegetarian recipe for butter chicken but with chickpeas not chicken and include many inline illustrations along the way"}
        ]
      }],
      "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
    }' > /tmp/out.json

And got back 41MB of JSON with 28 base64 images in it: https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...

At 4c per image that's more than a dollar on that single prompt.

I built this quick tool https://tools.simonwillison.net/gemini-image-json for pasting that JSON into to see it rendered.

weird-eye-issue 2 days ago

I mean you did ask for "many illustrations"

eminence32 3 days ago

This seems neat, I guess. But whenever I try tools like this, I often run into the limits of what I can describe in words. I might try something like "Add some clutter to the desk, including stacks of paper and notebooks" but when it doesn't quite look like what I want, I'm not sure what else to do except try slightly different wordings until the output happens to land on what I want.

I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head. But I guess I have a lot of doubts about using a conversational interface for this kind of stuff

monster_truck 3 days ago

Chucking images at any model that supports image input and asking it to describe specific areas/things 'in extreme detail' is a decent way to get an idea of what its expecting vs what you want.
- thornewolf 3 days ago
  
  +1 to this flow. I use the exact same phrase "in extreme detail" as well haha. Additionally, I ask the model to describe what prompt it might write to produce some edit itself.
crooked-v 3 days ago

I just tried a couple of cases that ChatGPT is bad at (reproducing certain scenes/setpieces from classic tabletop RPG adventures, like the weird pyramid from classic D&D B4 The Lost City), and Gemini fails in just about the same way of getting architectural proportions and scenery details wrong even when given simple, broad rules about them. Adding more detail seems kind of pointless when it can't even get basics like "creature X is about as tall as the building around it" or "the pyramid is surrounded by ruined buildings" right.
- BoorishBears 3 days ago
  
  What's an example of a prompt you tried and it failed on?
qoez 3 days ago

Maybe that's how the future will unfold. There will be subtle things AI fails to learn, and there will be differences in skills in how good people are at making AI do things, which will be a new skill in itself and will end up being determining difference in pay in the future.
- gowld 2 days ago
  
  This is "Prompt Engineering"
metalrain 2 days ago

Exactly, describing more complex compositions, lighting, image enchancements/filters there is so many things you know how it looks but to describe it such that LLM gets it and will reproduce it is pretty difficult.
Sometimes sketching it could be helpful, but more abstract technical thing like LUTs, feels still out of reach.
xbmcuser 3 days ago

ask Gemini to word your thoughts better then use those to do the image editing.
betterThanTexas 3 days ago

> I'm sure part of this is a lack of imagination on my part about how to describe the vague image in my own head.
This is more related to our ability to articulate than is easy to demonstrate, in my experience. I can certainly produce images in my head I have difficulty reproducing well and consistently via linguistic description.
- SketchySeaBeast 3 days ago
  
  It's almost as if being able to create art accurate to our mental vision requires practice and skill, be it the ability to create an image or to write it and invoke an imagine in others.
  - betterThanTexas 3 days ago
    
    Absolutely! But this was surprising to me—my intuition says if I can firmly visualize something, I should be able to describe it. I think many people have this assumption and it's responsible for a lot of projection in our social lives.
    
    SketchySeaBeast 3 days ago
    
    Yeah, it's probably a good argument for having people try some form of art, to have them understand that their intent and their outcome is rarely the same.
Nevermark 3 days ago

Perhaps describe the types and styles of work associated with the desk, to create a coherent character to the clutter
bufferoverflow 2 days ago

In that scenario, if you can't describe what you want with words, a human designer can't read your mind either.
- Hasnep 2 days ago
  
  No, but a good designer will be able to help you put what you want into words.
  - gowld 2 days ago
    
    Ask the AI to help you put what you want into words.
    
    maksimur 2 days ago
    
    I think the issue with AI (in contrast to human interaction) is its lack of real-time responsiveness. This slower back-and-forth can lead to frustration, especially if it takes a dozen or more messages to get the point across. Humans are also helped in helping you by contextual cues like gestures, facial expressions or "shared qualia".
zoogeny 3 days ago

I would politely suggest you work at getting better at this since it would be a pretty important skill in a world where a lot of creative work is done by AI.
As some have mentioned, LLMs are treasure troves of information for learning how to prompt the LLM. One thing to get over is a fear of embarrassment in what you say to the LLM. Just write a stream of consciousness to the LLM about what you want and ask it to generate a prompt based on that. "I have an image that I am trying to get an image LLM to add some clutter to. But when I ask it to do it, like I say add some stack of paper and notebooks, but it doesn't look like I want because they are neat stacks of paper. What I want is a desk that kind of looks like it has been worked at for a while by a typical office worker, like at the end of the day with a half empty coffee cup and .... ". Just ramble away and then ask the LLM to give you the best prompt. And if it doesn't work, literally go back to the same message chain and say "I tried that prompt and it was [better|worse] than before because ...".
This is one of those opportunities where life is giving you an option: give up or learn. Choose wisely.

refulgentis 3 days ago

Another release from Google!

Now I can use:

- Gemini 2.0 Flash Image Generation Preview (May) instead of Gemini 2.0 Flash Image Generation Preview (March)

- or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview ("natively multimodal" w/o image generation)

- When I need to control thinking budgets, I can do that with Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price increase over a month prior

- And when I need realtime, fallback to Gemini 2.0 Flash 001 Live Preview (announced as In Preview on April 9 2025 after the Multimodal Live API was announced as released on December 11 2024)

- I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO Edition's thinking budgets, but good news follows in the next bullet: they'll swap the model out underneath me with one that thinks ~10x less so at least its in the same cost ballpark as their competitors

- and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25 released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday! Yay!

justanotheratom 3 days ago

Yay! do you use your Gemini in Gemini App or AI Studio or Vertex AI?
- refulgentis 3 days ago
  
  I am Don Quixote, building a app that abstracts over models (i.e. allows user choice), while providing them a user-controlled set of tools, and allowing users to write their own "scripts", i.e. precanned dialogue / response steps to permit ex. building of search.
  Which is probably what makes me so cranky here. It's very hard keeping track of all of it and doing my best to lever up the models that are behind Claude's agentic capabilities, and all the Newspeak of Google PR makes it consume almost as much energy as the rest of the providers combined. (I'm v frustrated that I didn't realize till yesterday that 2.0 Flash had quietly gone from 10 RPM to 'you can actually use it')
  I'm a Xoogler and I get why this happens ("preview" is a magic wand that means "you don't have to get everyone in bureaucracy across DeepMind/Cloud/? to agree to get this done and fill out their damn launchcal"), but, man.
xnx 3 days ago

A matrix of models, capabilities, and prices would be really useful.

cush 3 days ago

The doodle demo is super fun

https://aistudio.google.com/apps/bundled/gemini-co-drawing?s...

mkl 2 days ago

> what the lamp from the second image would look like on the desk from the first image

The lamp is put on a different desk in a totally different room, with AI mush in the foreground. Props for not cherry-picking a first example, I guess. The sofa colour one is somehow much better, with a less specific instruction.

PurpleRamen 2 days ago

It looks like the same table, but some of the legs are missing. Parts of the lamp are also missing. And the scaling of it looks really wrong.
What a great, awful world we will have if people really start making decisions based on these results. I'm curious if in some years we will have people who fancy AI-trash chic seriously..
cyral 2 days ago

That one is an odd example.. especially since image #3 does a similar task with excellent accuracy in keeping the old image intact. I've had the same issues when trying to make it visualize adding decor, it ends up changing the whole room or furniture materials.

minimaxir 3 days ago

Of note is that the per-image pricing for Gemini 2.0 image generation is $0.039 per image, which is more expensive than Imagen 3 ($0.03 per image): https://ai.google.dev/gemini-api/docs/pricing

The main difference is that Gemini does allow for incorporating a conversation to generate the image as demoed here, while Imagen 3 is a strict text-in/image-out with optional mask-constrained edits but likely allows for higher-quality images overall if skilled with prompt engineering. This is a nuance that is annoying to differentiate.

vunderba 3 days ago

Anecdotal but from preliminary sandbox testing side-by-side with Gemini 2.0 Flash and Imagen 3.0 - it definitely appears that that is the case - higher overall visual quality from Imagen 3.
ipsum2 3 days ago

> likely allows for higher-quality images overall
What makes you say that?

GaggiX 3 days ago

Not available in the EU, first version was and then removed.

Btw still not as good as ChatGPT but much, much faster, it's a nice progress compare to the previous model.

thornewolf 3 days ago

Model outputs look good-ish. I think they are neat. I updated my recent hack project https://lifestyle.photo to the new model. It's middling-to-good.

There are a lot of failure modes still but what I want is a very large cookbook showing what known-good workflows are. Since this is just so directly downstream of (limited) training data it might be that I am just prompting in a ever so slightly bad way.

sigmaisaletter 3 days ago

Re your project: I'd expect at least the demo to not have an obvious flaw. The "lifestyle" version of your bag has a handle that is nearly twice as long as the "product" version.
- thornewolf 3 days ago
  
  This is a fair critique. While I am merely a "LLM wrapper", I should put the product's best foot forward and pay more attention to my showcase examples.
nico 3 days ago

Love your project, great application of gen AI, very straightforward value proposition, excellent and clear messaging
Very well done!
- thornewolf 3 days ago
  
  Thank you for the kind words! I am looking forward to creating a Show HN next week alongside a Product Hunt announcement. I appreciate any and all feedback. You can provide it through the website directly or through the email I have attached in my bio.

mNovak 3 days ago

I'm getting mixed results with the co-drawing demo, in terms of understanding what stick figures are, which seems pretty important for the 99% of us who can't draw a realistic human. I was hoping to sketch a scene, and let the model "inflate" it, but I ended up with 3D rendered stick figures.

Seems to help if you explicitly describe the scene, but then the drawing-along aspect seem relatively pointless.

Yiling-J 2 days ago

I generated 100 recipes with images using gemini-2.0-flash and gemini-2.0-flash-exp-image-generation as a demo of text+image generation in my open-source project: https://github.com/Yiling-J/tablepilot/tree/main/examples/10...

You can see the full table with images here: https://tabulator-ai.notion.site/1df2066c65b580e9ad76dbd12ae...

I think the results came out quiet well. Be aware I don't generate a text prompt based on row data for image generation. Instead, the raw row data(ingredients, instructions...) and table metadata(column names and descriptions) are sent directly to gemini-2.0-flash-exp-image-generation.

mvdtnz 2 days ago

I gave this a crack this morning, trying something very similar to the examples. I tried to get Gemini 2.0 Preview to add a set of bi-fold doors to a picture of a house in a particular place. It failed completely. It put them in the wrong place, they looked absolutely hideous (like I had pasted them in with MS Paint) and the more I tried to correct it with prompts the worse it got. At one point when I re-prompted it, it said

> Okay, I understand. You want me to replace ONLY the four windows located underneath the arched openings on the right side of the house with bifold doors, leaving all other features of the house unchanged. Here is the edited image:

Followed by no image. This is a behaviour I have seen many times from Gemini in the past so it's frustrating that it's still a problem.

I give this a 0/10 for my first use case.

ohadron 3 days ago

For one thing, it's way faster than the OpenAI equivalent in a way that might unlock additional use cases.

freedomben 3 days ago

Speed has been the consistent thing I've noticed with Gemini too, even going back to the earlier days when Gemini was a bit of a laughing stock. Gemini is fast
julianeon 3 days ago

I don't know exactly the speed/quality tradeoff but I'll tell you this: Google may be erring too much on the speed side. It's fast but junk. I suspect a lot of people try it then bounce off back to Midjourney, like I did.

pentagrama 3 days ago

I want to take a step back and reflect on what this actually shows us. Look at the examples Google provides: it refers to the generated objects as "products", clearly pointing toward shopping or e-commerce use cases.

It seems like the real goal here, for Google and other AI companies, is a world flooded with endless AI-generated variants of objects that don’t even exist yet, crafted to be sold and marketed (probably by AI too) to hyper-targeted audiences. This feels like an incoming wave of "AI slop", mass-produced synthetic content, crashing against the small island of genuine human craftsmanship and real, existing objects.

hapticmonkey 2 days ago

It's sort of sad how these tools went from "godlike new era of human civilization" to "some commodity tools for marketing teams to sell stuff".
I get that they are trying to find some practical used cases for their tools. But there's no enlightenment in the product development here.
If this is already the part of the s-curve where these AI tools get diminishing returns...what a waste of everybody's time.
nly 2 days ago

Recently I've been seeing a lot of holiday lets on sites like Rightmove (UK) and Airbnb with clearly AI generated 'enhancements' to the photos.
It should be illegal in my view.
vunderba 3 days ago

Yeah - and honestly I don't really get this. Using GenAI for real-world products seems like a recipe for a slew of incoming fraudulent advertising lawsuits if the images are slightly different from the actual physical products yet presented as if they are actual real photographs.
nkozyra 3 days ago

The gating factor here is the pool of consumers. Once people have slop exhaustion there's nobody to sell this to.
Maybe this is why all of the future AI fiction has people dressed in the same bland clothing.

egamirorrim 3 days ago

I don't understand how to use this, I keep trying to edit a photo (change a jacket to a t-shirt) of myself in the Gemini app with 2.0 flash selected and it just generated a new image that's nothing like the original

FergusArgyll 3 days ago

I think this is just in AI Studio. In the Gemini app I think it goes: Flash describes the image to imagen -> imagen generates a new image
thornewolf 3 days ago

It is very sensitive to your input prompts. Minor differences will result in drastic quality differences.
julianeon 3 days ago

Remember you are paying about 4 cents an image if I'm understanding the pricing correctly.

qq99 3 days ago

Wasn't this already available in AI Studio? It sounds like they also improved the image quality. It's hard to keep up with what's new with all these versions

simonw 2 days ago

Posted some notes from trying this out here, including examples of the images it produced and a tool for rendering the JSON https://simonwillison.net/2025/May/7/gemini-images-preview/

voidUpdate 2 days ago

1 doesn't actually really show how the lamp would look in that situation... in the first image it's about the same height as the sofa. I'd expect it to be at least twice the size that it is in the second image. Also what is going on underneath the table?

mastazi 2 days ago

LLMs are notoriously bad at estimating or comparing the size of physical objects due to the fact that their training happens away from the physical world, it is a well known problem and recently it has been discussed even in popular media e.g. https://theconversation.com/why-ai-cant-ever-reach-its-full-...
- voidUpdate 2 days ago
  
  Probably not the best idea to try and use that as a demo of how awesome your new image generator is then
yots 2 days ago

That table image is... horribly honest? I'm a bit shocked they used it in a blog post. It's really, really bad

taylorhughes 3 days ago

Image editing/compositing/remixing is not quite as good as gpt-image-1, but the results are really compelling anyway due to the dramatic increase in speed! Playing with it just now, it's often 5 seconds for a compositing task between multiple images. Feels totally different from waiting 30s+ for gpt-image-1.

Tsarp 2 days ago

There are direct prompt tests and then there are tests with tooling.

If for example you use controlnets you can pretty much get very close to a style composition that you need with an open model like Flux that will be far better. Flux has a few successors coming up now

emporas 2 days ago

I use gemini to create covers for songs/albums i make, with beautiful typography. Something like this [1]. I was dying of curiosity, how ideogram managed to create such gorgeous images. I figured it out 2 days ago.

I take an image with some desired colors or typography from an already existing music album or from Ideogram's poster section. I pass it to gemini and give the command:

"describe the texture of the picture, all the element and their position in the picture, left side, center right side, up and down, the color using rgb, the artistic style and the calligraphy or font of the letters"

Then i take the result and pass it through an LLM, a different LLM because i don't like gemini that much, i find it is much less coherent than other models. I use qwen-qwq-32b usually and I take the description gemini outputs and give it to qwen:

" write a similar description, but this time i want a surreal painting with several imaginative colors. Follow the example of image description, add several new and beautiful shapes of all elements and give all details, every side which brushstrokes it uses, and rgb colors it uses, the color palette of the elements of the page, i want it to be a pastel painting like the example, and don't put bioluminesence. I want it to be old style retro style mystery sci fi. Also i want to have a title of "Song Title" and describe the artistic font it uses and it's position in the painting, it should be designed as a drum n bass album cover "*

Then i take the result and give it back to gemini with command: "Create an image with text "Song Title" for an album cover: here is the description of the rest of the album"

If the resulting image is good, then it is time to add font, i take the new image description and pass it through qwen again, supposing the image description has fields Title and Typography:

"rewrite the description and add full description of the letters and font of text, clean or distressed, jagged or fluid letters or any other property they might have, where they are overlayed, and make some new patterns about the letter appearance and how big they are and the material they are made of, rewrite the Title and Typography."

I replace the previous description's section Title and Typography with the new description and create images with beautiful fonts.

[1] https://imgur.com/a/8TCUJ75

reneherse 2 days ago

Thanks for sharing your process. That example is some of the best gen art I've seen.
- emporas a day ago
  
  Thanks. This workflow of 4 prompts, has the benefit of not using the mouse.
  I have a friend who uses Photoshop to make posters for bands, the resulting images are better, faces of real people are put in the poster, but he does 1 million clicks every time. I use only Emacs to make the image, much faster workflow, more relaxing, i just edit text most of the time.
  Gemini's image generation abilities, especially regarding typography, are in the same ballpark as Ideogram. Ideogram is a little bit better sometimes, vertical text for example trips up Gemini, but Gemini being native multimodal, works very well with text descriptions of images.
  Ideogram has an upper limit to the total number of tokens it can accept as a text input. It is not native multimodal as far as i know.

cthulberg 2 days ago

gemini-2.0-flash-*-image-generation models are not currently supported in a number of countries in Europe, Middle East & Africa

source: https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flas... and my Google Ai Studio

jansan 3 days ago

Some examples are quite impressive, but the one with the ice bear on the white mug is very underwhelming and the co-drawing looks like it was hacked together by a vibe coder.

cyral 2 days ago

It looks like those horribly edited gift mugs I see on amazon occasionally, where someone just puts the image over the mug without accounting for the 3D shape. Too many variants to actually take the image. Would have been an excellent example to show how much better AI is if they made it do that.
thornewolf 3 days ago

The co-drawing is definitely not a fully fleshed-out product or anything but I think it is a great tech demo. What don't you like about it?

adverbly 3 days ago

Google totally crushing it and stock is down 8% today :|

Is it just me or is the market just absolutely terrible at understanding the implications and speed of progress behind what's happening right now in the walls of big G?

abirch 3 days ago

A potential reason that GOOG is down right now is that Apple is looking at AI Search Engines.
https://www.bloomberg.com/news/articles/2025-05-07/apple-wor...
Although AI is fun and great, an AI search engine may have issues of being unprofitable. It's similar to how 23 and Me got many customers selling a 500 dollar test to people for 100 dollars.
- xnx 3 days ago
  
  Would be quite a financial swing for Apple from getting paid billions of dollars by Google for search to having to spend billions of dollars to make their own.
  - abirch 3 days ago
    
    From the article Eddy Cue is Apple’s senior vice president of services. "Cue said he believes that AI search providers, including OpenAI, Perplexity AI Inc. and Anthropic PBC, will eventually replace standard search engines like Alphabet’s Google. He said he believes Apple will bring those options to Safari in the future."
    So Apple may not be making their own, but they won't be spending billions either. I'm wondering how the people will be able to monetize the searches so that they make money.
    
    mattlondon 3 days ago
    
    FWIW I searched this story not long after it broke and Google - yes the traditional "old school search engine" - had an AI-generated summary of the story with a breakdown of the whys and how's right there at the top of the page. This was basically real time given or take 10 minutes.
    I am not sure why people think OpenAI et al are going to eat Google's lunch here. Seems like they're already doing AI-for-search and if there is anyone who can do it cheaply and at scale I bet on Google being the ones to do it (with all their data centers, data integrations/crawlers, and custom hardware and experience etc). I doubt some startup using the Bing-index and renting off-the-shelf Nvidia hardware using investor-funds is going to leapfrog Google-scale infrastructure and expertise.
resource_waste 3 days ago

Why would any of this have an impact on stock prices?
LLMs are insanely competitive and a dime a dozen now. Most professional uses can get away with local models.
This is image generation... Niche cases in another saturated market.
How are any of these supposed to make google billions of dollars?
lenerdenator 3 days ago

The market is absolutely terrible at a lot of things.

Zoethink 2 days ago

[dead]

curtisszmania 2 days ago

[dead]