>>43613236>Ah cool! Though since its one time copy, I don't mind. I don't think it even ate up space, did it?As in, did good_shit and HLL3 eat up space on your own Drive if you had a shortcut to my folder created? Nope. Beauty of Google Drive shortcuts. I only have the one throwaway account so Google hasn't bonked me on my paying one for it, bless.
>Can you explain what fp16 model means? I've seen some fp16 models and their filesize was ridiculously low.It looks like another Anon has already answered you, but yeah, the answer is "precision". One thing though - the GPUs we use in Colab? Automatically use "half" - that means half precision, or fp16 - to save RAM, and run faster. They do that on the fly. So loading an fp32
the 4 GB ones model into Colab is just wasting time while that extra 2 GB downloads because webui uses "autocast" and tosses that shit out anyways.
There is a flipside to that though! Running on CPU, like my dumb ass was for quite some time? You have to use "--precision full --no-half". Pretty sure you have to use "--precision full --no-half" on AMD GPUs
ROCm on Linux, and on those older nVIDIA GPUs where, like that Anon mentioned, FP16 performance is actually 64 times slower than FP32.
>NVIDIA GeForce GTX 1080>GP104, GP104-400-A1, Pascal>FP16 (half)>138.6 GFLOPS (1:64)>FP32 (float)>8.873 TFLOPSThat's fuckin slow. Compare that to a 1660Ti, which I don't think was actually any better for gayming? It definitely sounds better for SD...
>NVIDIA GeForce GTX 1660 Ti>TU116, TU116-400-A1, TuringFP16 (half)
10.87 TFLOPS (2:1)
FP32 (float)
5.437 TFLOPS
Slower than the 1080 in full precision, but ends up faster with half precision. Everyone has to use "--no-half-vae" anyways
still don't understand why though, so it's probably a wash?
I think there are better alternatives to "--no-half" - like "--upcast-sampling" - which does the same thing, but while using less RAM and being faster, according to the documentation.
There are others that I think might help in that scenario too: "--opt-sub-quad-attention" Only thing I'm not sure about is, that if you're using something like a GTX 1080, where you probably should be using "--precision full" all the time
???, do you still get the same output and performance by using "--precision full" with an FP32 model as you would by using the FP16 model with "--precision full --upcast-sampling" or "--precision full --no-half"? Or is there a penalty - or a difference in output.
I have no GPU, so while I DID go ahead and delete any >4 GB models on the HF
because those just have useless EMA weights in them - EMA = not used during image gen whatsoever - I did keep those fp32 models until I know the answer to that for sure.
If you want to read a little bit more:
https://infohub.delltechnologies.com/l/deep-learning-with-dell-emc-isilon-1/floating-point-precision-fp16-vs-fp32If you want to read considerably more:
https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html