Tried training using a timestep distrubution from /h/. It really works!
>timesteps = torch.sigmoid(torch.normal(mean=-0.3, std=1, size=(b_size,), device=device))>timesteps = (max_timestep * timesteps).long()1st lora was trained for 2000 steps (batch size 8) with weird settings - it learned the characters and it looks interesting and colorful
2nd lora was trained for 6000 steps (batch size 8) with normal settings - it learned the characters slightly better, but colors suck and training time is 3 times longer
both versions are slightly fried due to using adan with high LR
https://litter.catbox.moe/h55fdy.jpghttps://litter.catbox.moe/2h1w3e.jpg