Because Imagen 3 is a text-to-image model, not an image-to-image model, so the i...

waffletower · 2025-04-16T15:57:10 1744819030

Typically generative models, particularly diffusion models like Imagen 3, are easily architected to support several vectors toward the latent space of the model. It is not open source so there might be an architectural reason I cannot see, but I don't think the public interface to the model should suggest its capabilities -- it is uncommon for image to image not to be supported in open source image generation models, for example. However, there are definite legal reasons not to provide such a vector in a public facing model like Imagen 3.

waffletower · 2025-04-16T16:03:22 1744819402

And Gemini gave the Yes-man treatment to my statement here :D "In summary: Your assessment aligns well with the technical realities of diffusion models and the practical, legal, and safety considerations large companies face when deploying powerful generative AI tools publicly. It's entirely feasible that Imagen 3's underlying architecture could support image inputs, but Google has chosen not to expose this capability publicly due to the associated risks and complexities."