interestingly, the thumbnail gives it away immediately.
Those kittens have been hanging out with these puppies.
I am a bit impressed they were able to integrate the message in. I wouldn’t have thought an AI to be able to do this yet
What’s the prompt to get this stuff done? This looks wild and interesting.
I suspect these are using additional tools to guide the AI beyond a simple prompt. For example the spiraling medieval village was generated with stable diffusion and controlnet.
I think the prompt is not much other than “puppies” and “kittens”. Major, middle and minor features of the image can be controlled individually in some AIs (they can be differentiated using a Fourier transform or Gauss convolutions and fed into different discriminators) so I think:
- major features (scenery) are controlled by the prompt (grass or couch)
- middle features (text) are a source image that the AI is punished for straying from
- minor features (details) are controlled by the prompt (faces and fur)
Or it’s just Stable Diffusion that starts with a text rather than random noise.
How do they make these?!
Stable Diffusion together with Controlnet. You basically feed it the text as a black and white image and provide it with a description of the picture of cats. It will then generate this output while using the black and white image as a base. It’s fairly simple to do but it can take a while to get a quality result such as this one.
That sounds pretty complicated. I’ve just been plugging the text and description into this site: https://glif.app/@fab1an/glifs/clmqp99820001jn0f2xywz250
these images are so fascinating