>>720542001epoch in and the model is already breaking apart and scaling back keys. I think 0.0001 lr was too high for this thing. I probably need 3e-5 to make it to 5 epochs.
However I am seeing some improvements. I'll have to test the lora myself instead of samples though.
https://files.catbox.moe/pgcmcy.pnghttps://files.catbox.moe/6ttkno.png