>>70541408My main reliance is Scale Weight Norms at 1 which prevents overtraining. SWN will add two new parameters in your terminal while everything is training. Average Key Norm which is used to identify when a threshold has been reached (for SWN=1 this is usually around 0.11-0.12). Once the threshold has been reached, the training script will start scaling back keys if they become overfit. This is shown by the second parameter named "Keys Scaled".
The sweet spot from my experience is to reach the threshold and start having Scaled keys around the half point of the training, then making sure the number of keys scaled doesn't go into too high double digits (12+).
If the threshold is not reached by the end of the training, the model is undertrained, and if the scaled keys go above 15 then it means you should lower the learning rate and take it slower instead. While the end result will not be completely baked by the end of it even with 20 keys scaled, there is no point in going that hard and then having to lower lora strength to 0.8 to make it do stuff.
I already explained some of these in my "how to train loras with small datasets" rentry and you can find a script for 1.5 there. Roughly the same settings apply to XL as well but with an increase in resolution and decrease in batch size, dim 16, alpha 4, cosine, no warmup, unet lr between 0.0001 and 0.00035 (for lumi) and I train text encoder as well with 0.00004:
https://rentry.org/CCC_TrainingI will try to update it tomorrow with SDXL settings as well.