>>95312069You should be using 0.3 if you actually want to use it. But in my opinion, all of the tunes are brain damaged because the community is lazy when you actually have to implement fine tuning, Deepseek explicitly spells it out.
>For distilled models, we apply only SFT and do not include an RL stage, even though incorporating RL could substantially boost model performance. Our primary goal here is to demonstrate the effectiveness of the distillation technique, leaving the exploration of the RL stage to the broader research community.But no, slop merges and LORas are easier and most people who have no idea what they are doing would rather pump out shit and hacks over doing the right thing literally spelled out.