>>105172720This does work but both the city96 gguf loader and multigpu gguf loader don't want to partially offload, so it only works cpu only, which is untenably slow.
>>105174503Thanks for the suggestion but no dice. Still getting OOM when offloading the max amount of blocks.
When I try to load it into GPU it OOMs saying it tried to allocate over 120GB, but loaded into CPU it doesn't even use 32GB. Very mysterious. I'm tired of troubleshooting so I'll try again some other time.