I am new to tensor quantization, and tried doing something as simple as
import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)
[email protected]
with PyTorch quantized tensors running on CPU. I thus tried
scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)
[email protected] # I tried...
..and got as error
RuntimeError: Could not run 'aten::mm' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId, CPUTensorId, SparseCUDATensorId].
Is matrix multiplication just not supported, or am I doing something wrong?
mul() method is used to perform element-wise multiplication on tensors in PyTorch. It multiplies the corresponding elements of the tensors. We can multiply two or more tensors. We can also multiply scalar and tensors.
PyTorch bmm is used for matrix multiplication in batches where the scenario involves that the matrices to be multiplied have the size of 3 dimensions that is x, y, and z and the dimension of the first dimension for matrices to be multiplied should be the same.
A Quantized Tensor allows for storing quantized data (represented as int8/uint8/int32) along with quantization parameters like scale and zero_point. Quantized Tensors allow for many useful operations making quantized arithmetic easy, in addition to allowing for serialization of data in a quantized format.
It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (@
) does not support it (as your error message suggests).
You should look at quantized operations, e.g., torch.nn.quantized.functional.linear
:
torch.nn.quantized.functional.linear(qx[None,...], qy.T)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With