I have a pretrained pytorch model I want to inference on fp16 instead of fp32, I have already tried this while using the gpu but when I try it on cpu I get:
"sum_cpu" not implemented for 'Half' torch.
any fixes?
As I know, a lot of CPU-based operations in Pytorch are not implemented to support FP16; instead, it's NVIDIA GPUs that have hardware support for FP16(e.g. tensor cores in Turing arch GPU) and PyTorch followed up since CUDA 7.0(ish). To accelerate inference on CPU by quantization to FP16, you may wanna try torch.bfloat16 dtype(https://github.com/pytorch/pytorch/issues/23509).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With