>>81285794Oh no I'm running 2x 3090s (48VRAM) - which is enough to run 4.5/4.65/5bpw at 32k context.
The problem I've run into is that at 2.4bpw the models are very different than above 4.5 because of a massive perplexity cliff which hits right at that threshold - I'd recommend going to a slightly smaller model or GGUF because of this phenomenon. (see pic rel)
For 24GB Vram I've seen a lot of people recommend the recently released gemma 27B which or Nemo-Mistral 12B, both of which are extremely efficient and should fit cleanly without lobotomization.