>>57901827>We plan to finetune the final base model with much higher context length, likely around 128,000 tokens.>and with quantization methods, that can go down to as low as 4GB for 128k context lengths.What the fuck. I'm too dumb to understand most of this, but is this 4GB model going to be as smart as the other ones? 128k is more than fucking Claude, this is fucking insane.