For each element in a randomized array of 2D indices (with potential duplicates), I want to "+=1" to the corresponding grid in a 2D zero array. However, I don't know how to optimize the computation. Using the standard for loop, as shown here,
def interadd():
U = 100
input = np.random.random(size=(5000,2)) * U
idx = np.floor(input).astype(np.int)
grids = np.zeros((U,U))
for i in range(len(input)):
grids[idx[i,0],idx[i,1]] += 1
return grids
The runtime can be quite significant:
>> timeit(interadd, number=5000)
43.69953393936157
Is there a way to vectorize this iterative process?
The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy. The data type of the output of vectorized is determined by calling the function with the first element of the input.
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
In order to create a vector, we use np. array method. Note: We can create vector with other method as well which return 1-D numpy array for example np. arange(10), np.
Vectorization is a term used outside of numpy, and in very basic terms is parallelisation of calculations. In normal python this would be done element by element using something like a for loop, so four calculations one after the other.
You could speed it up a little by using np.add.at
, which correctly handles the case of duplicate indices:
def interadd(U, idx):
grids = np.zeros((U,U))
for i in range(len(idx)):
grids[idx[i,0],idx[i,1]] += 1
return grids
def interadd2(U, idx):
grids = np.zeros((U,U))
np.add.at(grids, idx.T.tolist(), 1)
return grids
def interadd3(U, idx):
# YXD suggestion
grids = np.zeros((U,U))
np.add.at(grids, (idx[:,0], idx[:,1]), 1)
return grids
which gives
>>> U = 100
>>> idx = np.floor(np.random.random(size=(5000,2))*U).astype(np.int)
>>> (interadd(U, idx) == interadd2(U, idx)).all()
True
>>> %timeit interadd(U, idx)
100 loops, best of 3: 8.48 ms per loop
>>> %timeit interadd2(U, idx)
100 loops, best of 3: 2.62 ms per loop
And YXD's suggestion:
>>> (interadd(U, idx) == interadd3(U, idx)).all()
True
>>> %timeit interadd3(U, idx)
1000 loops, best of 3: 1.09 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With