Quoted By:
hll4 should be ready in a few days.
It takes so long because first I was busy with IRL stuff, then I was experimenting with dataset preparation scripts, optimizer settings and various training-related tricks.
So far it seems that using Lion, you can train ~3x faster if you start with 1-2 epochs at very high LR, then resume at normal LR. Adam breaks more at high LR.
Picrel - intentionally high LR with normal cosine scheduler.