>>66734811Usually 1000 but on XL I see some that fit way quicker while others just slump midway and have to be restarted with higher lr. This is where scale weight norm comes in handy, because you can clearly see in the terminal when the model stops training mid-way as the average key norm is not increasing anymore.
This one was already fit by 600 steps and by 800 it stopped training anything other than contrast and minor details while baking in the outfit.