>>33357482A neural network trained on over 5 billion images uses diffusion to generate a bunch of random noise and then hallucinates an image over multiple steps based on the text you input, using its semantic understanding of text/image pairs to create what you describe.
It's still black magic to 99% of people.