>>66961876As long as there are no big loss fluctuations as in an order of magnitude, or if it goes to NaN, I have seen loss as pretty irrelevant for LoRAs. You can test this by simply frying one and observe how the loss value still fluctuates over time at the same rate even if the lora has been perfectly cooked 20 epoch prior.
I've mentioned Scale Weight Norms previously and I will keep doing it as I've seen how much it can help with not frying loras up to a point.
Another nice use of it is to keep track of the lora training. If you enable it, in your terminal, next to the loading bar, you will start seeing 2 new parameters. The weight average and the "Keys Scaled". There are no explanations for it i could find, but it's a numerical representation of how many things(vector weights) would have been fried if they weren't held back by the 'scale weight norms' setting. (Max Norm Regularization)
From my testing having a few of them for a few epochs is fine as it scales those fried things back while allowing some other concepts to train in the meantime, but if it keeps a high number going over 10+ epochs, the lora will probably end up getting more and more elements fried.
In any case, you can use the number to have a rough idea of how trained/close to frying your lora is. For characters I found that having maybe 6-7 keys scaled for 2 epochs is fine while getting enough contrast and minor details in as well. if it gets in the 15+ the LR is probably too high or I should have stopped it from training sooner.
In picrel the LoRA was pretty much perfectly trained by epoch 7 with 6 scaled keys and there are no major differences after that. But epoch 10 is just as usable because the weights have been scaled back every epoch without allowing them to fry too much.
Holy wall of text! Sorry!