I have a large 2D numpy array. I would like to be able to efficiently run row-wise operations on subsets of the columns, without copying the data.
In what follows,
a = np.arange(1000000).reshape(1000, 10000) and columns = np.arange(1, 1000, 2). For reference,
In [4]: %timeit a.sum(axis=1)
7.26 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The approaches I am aware of are:
In [5]: %timeit a[:, columns].sum(axis=1)
42.5 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: cols_mask = np.zeros(10000, dtype=bool)
...: cols_mask[columns] = True
In [7]: %timeit a[:, cols_mask].sum(axis=1)
42.1 ms ± 302 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [8]: cells_mask = np.ones((1000, 10000), dtype=bool)
In [9]: cells_mask[:, columns] = False
In [10]: am = np.ma.masked_array(a, mask=cells_mask)
In [11]: %timeit am.sum(axis=1)
80 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]: %timeit sum([a[:, i] for i in columns])
31.2 ms ± 531 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Somewhat surprisingly to me, the last approach is the most efficient: moreover, it avoids copying the full data, which for me is a prerequisite. However, it is still much slower than the simple sum (on double the data size), and most importantly, it is not trivial to generalize to other operations (e.g., cumsum).
Is there any approach I am missing? I would be fine with writing some cython code, but I would like the approach to work for any numpy function, not just sum.
On this one pythran seems a bit faster than numba at least on my rig:
import numpy as np
#pythran export col_sum(float[:,:], int[:])
#pythran export col_sum(int[:,:], int[:])
def col_sum(data, idx):
return data.T[idx].sum(0)
Compile with pythran <filename.py>
Timings:
timeit(lambda:cs_pythran.col_sum(a, columns),number=1000)
# 1.644187423051335
timeit(lambda:cs_numba.col_sum(a, columns),number=1000)
# 2.635075871949084
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With