Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is looping through pytorch tensors so slow (compared to Numpy)?

I've been working with image transformations recently and came to a situation where I have a large array (shape of 100,000 x 3) where each row represents a point in 3D space like:

pnt = [x y z]

All I'm trying to do is iterating through each point and matrix multiplying each point with a matrix called T (shape = 3 X 3).

Test with Numpy:

def transform(pnt_cloud, T):
    
    i = 0
    for pnt in pnt_cloud:
        xyz_pnt = np.dot(T, pnt)
        
        if xyz_pnt[0] > 0:
            arr[i] = xyz_pnt[0]
            
        i += 1
           
    return arr

Calling the following code and calculating runtime (using %time) gives the output:

Out[190]: CPU times: user 670 ms, sys: 7.91 ms, total: 678 ms
Wall time: 674 ms

Test with Pytorch Tensor:

import torch

tensor_cld = torch.tensor(pnt_cloud)
tensor_T   = torch.tensor(T)

def transform(pnt_cloud, T):
    depth_array = torch.tensor(np.zeros(pnt_cloud.shape[0]))

    i = 0
    for pnt in pnt_cloud:
        xyz_pnt = torch.matmul(T, pnt)
        
        if xyz_pnt[0] > 0:
            depth_array[i] = xyz_pnt[0]
            
        i += 1
            
        
    return depth_array

Calling the following code and calculating runtime (using %time) gives the output:

Out[199]: CPU times: user 6.15 s, sys: 28.1 ms, total: 6.18 s
Wall time: 6.09 s

NOTE: Doing the same with torch.jit only reduces 2s

I would have thought that PyTorch tensor computations would be much faster due to the way PyTorch breaks its code down in the compiling stage. What am I missing here?

Would there be any faster way to do this other than using Numba?

like image 275
lakshjaisinghani Avatar asked Dec 02 '25 11:12

lakshjaisinghani


2 Answers

For the speed, I got this reply from the PyTorch forums:

  1. operations of 1-3 elements are generally rather expensive in PyTorch as the overhead of Tensor creation becomes significant (this includes setting single elements), I think this is the main thing here. This is also the reason why the JIT doesn’t help a whole lot (it only takes away the Python overhead) and Numby shines (where e.g. the assignment to depth_array[i] is just memory write).

  2. the matmul itself might differ in speed if you have different BLAS backends for it in PyTorch vs. NumPy.

like image 140
lakshjaisinghani Avatar answered Dec 04 '25 01:12

lakshjaisinghani


Why are you using a for loop??
Why do you compute a 3x3 dot product and only uses the first element of the result??

You can do all the math in a single matmul:

with torch.no_grad():
  depth_array = torch.matmul(pnt_cloud, T[:1, :].T)  # nx3 dot 3x1 -> nx1
  # since you only want non negative results
  depth_array = torch.maximum(depth_array, 0)

Since you want to compare runtime to numpy, you should disable gradient accumulation.

like image 24
Shai Avatar answered Dec 04 '25 00:12

Shai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!