>>2230904It's extremely difficult for neural networks to develop complex functions despite it being theoretically possible with enough neurons. If you can break down the process into irreducible components that work together, the neural network will train much faster, with less data, and produce much better results.
Read the paper on Progressive Growing of GANs. When you start off with generating a tiny picture it will quickly determine if a pixel is a bikini or unclothed. The next layer will figure out if it's kinda rotated this way or that way. Then the next will refine its shape and the next one will put finer details in and so on. The paper explains how to train the first layer and gradually add new network layers on top of it.
StackGANs do a 2nd pass for refinement by downsampling the first stage of generation, combining it with the image description vector spatially replicated to the size of the downsampled feature maps, and combining those two sets of feature maps to produce the second stage of generation. StackGAN-v2 improved on this process by breaking it down into a repeatable structure of multiple generators and discriminators that continually refine and improve the image.
In place of StackGAN's image description vector, you could probably drop ALI in to produce latent feature vectors of the input/target images and spatially replicate them to use for the second stage of generation. ALI and Hierarchical ALI are quite good at disentangling features in such a way that it produces more sensible results for the entire latent space.
For the second pass downsampling is really important to reduce the dimensionality or you'll just end up with a deeper neural network that's harder to train due to vanishing gradients. You can avoid this though by combining the deep convolutional net with residual nets.
Also the adversarial game should produce both clothed and unclothed versions if you're not already. This will help disentangle character features from bikinis.