>>21972806>>21972805cheers negroid. obviously it can extend onto ram, but thats torturous in terms of processing time.
when there was no quantized flux yet i was running the 16gb fp8 and it took like 180 sec compared to the 11gb nf4. i think nf4 was not just quantized, but also a smaller intial model, but there was some weird fuckery involved with offloading the model from vram immediately after each query.
damn, that shit is speedy as fuck.