Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line :
np.array_split(vector, 512)
How do I call and analyse each array ? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ?
Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D
array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.
Thus, we would have an implementation like so -
N = 512 # Number of elements in each split array
M = 5 # Number of elements in each subarray for sorting and summing
b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Step-by-step sample run -
In [217]: a # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])
In [218]: N = 7 # 512 for original case, 7 for sample
In [219]: M = 5
# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)
In [224]: b
Out[224]:
array([[45, 19, 71, 53, 20, 33, 31],
[20, 41, 19, 38, 31, 86, 34]])
# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]:
array([[1, 4, 6, 5, 0, 3, 2],
[2, 0, 4, 6, 3, 1, 5]])
# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]:
array([[1, 4, 6, 5, 0],
[2, 0, 4, 6, 3]])
# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]:
array([[19, 20, 31, 33, 45],
[19, 20, 31, 34, 38]])
# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])
Performance boost with np.argpartition
-
out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)
Runtime test -
In [236]: a = np.random.randint(11,99,(512*512))
In [237]: N = 512
In [238]: M = 5
In [239]: b = a.reshape(-1,N)
In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop
In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop
A more detailed version of doing what you want
import numpy as np
from numpy.testing.utils import assert_array_equal
vector = np.random.rand(262144)
splits = np.array_split(vector, 512)
sums = []
for split in splits:
# sort it
split.sort()
# sublist
subSplit = split[:5]
#build sum
splitSum = sum(subSplit)
# add to new list
sums.append(splitSum)
print np.array(sums).shape
Same output as @Divakar 's solution
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With