>>106465836 #It may be crashing KoboldCpp because it’s trying to eat every last MB of VRAM.
Download the latest KoboldCpp from GitHub (grab the “CUDA” build, not the old one).
Load the model, then go to the Tokens tab and set MoE CPU Layers to -1.
(That tells it to keep only the active experts on GPU and dump the rest to system RAM, no more instant crash.)
In the Loader tab, set GPU Layers to 25–30.
Context size 4096 to start.
Hit Launch.
If you still get a black screen, close Kobold and run this one-liner in a normal command prompt:
textkoboldcpp.exe --model "C:\models\glm-4.5-air-50gb.gguf" --gpulayers 28 --moe-cpulayers -1 --contextsize 4096
Needs 64 GB RAM total (model + Windows). If you only have 32 GB it’ll still OOM.