In NumPy, I have a d x n array A and a list L of length n, describing where I want each column of A to end up in matrix B. The idea is that column i of matrix B is the sum of all columns of A for which the corresponding value in L is i.
I can do this with a for loop:
A = np.arange(15).reshape(3,5)
L = [0,1,2,1,1]
n_cols = 3
B = np.zeros((len(A), n_cols))
# assume I know the desired number of columns,
# which is also one more than the maximum value of L
for i, a in enumerate(A.T):
B[:, L[i]] += a
I was wondering if there a way to do this by slicing array A (or in some otherwise vectorized manner)?
You are sum-reducing/collapsing the columns off A using L for selecting those columns. Also, you are updating the columns of output array based on the uniqueness of L elems.
Thus, you can use np.add.reduceat for a vectorized solution, like so -
sidx = L.argsort()
col_idx, grp_start_idx = np.unique(L[sidx],return_index=True)
B_out = np.zeros((len(A), n_cols))
B_out[:,col_idx] = np.add.reduceat(A[:,sidx],grp_start_idx,axis=1)
Runtime test -
In [129]: def org_app(A,n_cols):
...: B = np.zeros((len(A), n_cols))
...: for i, a in enumerate(A.T):
...: B[:, L[i]] += a
...: return B
...:
...: def vectorized_app(A,n_cols):
...: sidx = L.argsort()
...: col_idx, grp_start_idx = np.unique(L[sidx],return_index=True)
...: B_out = np.zeros((len(A), n_cols))
...: B_out[:,col_idx] = np.add.reduceat(A[:,sidx],grp_start_idx,axis=1)
...: return B_out
...:
In [130]: # Setup inputs with an appreciable no. of cols & lesser rows
...: # so as that memory bandwidth to work with huge number of
...: # row elems doesn't become the bottleneck
...: d,n_cols = 10,5000
...: A = np.random.rand(d,n_cols)
...: L = np.random.randint(0,n_cols,(n_cols,))
...:
In [131]: np.allclose(org_app(A,n_cols),vectorized_app(A,n_cols))
Out[131]: True
In [132]: %timeit org_app(A,n_cols)
10 loops, best of 3: 33.3 ms per loop
In [133]: %timeit vectorized_app(A,n_cols)
100 loops, best of 3: 1.87 ms per loop
As the number of rows become comparable with the number of cols in A, the high memory bandwidth requirements for the vectorized approach would offset any noticeable speedup from it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With