I have two 2D numpy arrays (simplified in this example with respect to size and content) with identical sizes.
An ID matrix:
1 1 1 2 2
1 1 2 2 5
1 1 2 5 5
1 2 2 5 5
2 2 5 5 5
and a value matrix:
14.8 17.0 74.3 40.3 90.2
25.2 75.9 5.6 40.0 33.7
78.9 39.3 11.3 63.6 56.7
11.4 75.7 78.4 88.7 58.6
79.6 32.3 35.3 52.5 13.3
My goal is to count and sum the values from the second matrix grouped by the IDs from the first matrix:
1: (8, 336.8)
2: (9, 453.4)
5: (8, 402.4)
I can do this in a for
loop but when the matrices have sizes in thousands instead of just 5x5 and thousands of unique ID's, it takes a lot of time to process.
Does numpy
have a clever method or a combination of methods for doing this?
To add the two arrays together, we will use the numpy. add(arr1,arr2) method. In order to use this method, you have to make sure that the two arrays have the same length. If the lengths of the two arrays are not the same, then broadcast the size of the shorter array by adding zero's at extra indexes.
sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.
We can perform the concatenation operation using the concatenate() function. With this function, arrays are concatenated either row-wise or column-wise, given that they have equal rows or columns respectively. Column-wise concatenation can be done by equating axis to 1 as an argument in the function.
Here's a vectorized approach to get the counts for ID
and ID-based
summed values for value
with a combination of np.unique
and np.bincount
-
unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
value_sums = np.bincount(idx,value.ravel())
To get the final output as a dictionary, you can use loop-comprehension to gather the summed values, like so -
{i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Sample run -
In [86]: ID
Out[86]:
array([[1, 1, 1, 2, 2],
[1, 1, 2, 2, 5],
[1, 1, 2, 5, 5],
[1, 2, 2, 5, 5],
[2, 2, 5, 5, 5]])
In [87]: value
Out[87]:
array([[ 14.8, 17. , 74.3, 40.3, 90.2],
[ 25.2, 75.9, 5.6, 40. , 33.7],
[ 78.9, 39.3, 11.3, 63.6, 56.7],
[ 11.4, 75.7, 78.4, 88.7, 58.6],
[ 79.6, 32.3, 35.3, 52.5, 13.3]])
In [88]: unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
...: value_sums = np.bincount(idx,value.ravel())
...:
In [89]: {i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Out[89]:
{1: (8, 336.80000000000001),
2: (9, 453.40000000000003),
5: (8, 402.40000000000003)}
This is possible with a combination of a few simple methods:
numpy.unique
to find each IDThis can look like this:
import numpy as np
ids = np.array([[1, 1, 1, 2, 2],
[1, 1, 2, 2, 5],
[1, 1, 2, 5, 5],
[1, 2, 2, 5, 5],
[2, 2, 5, 5, 5]])
values = np.array([[14.8, 17.0, 74.3, 40.3, 90.2],
[25.2, 75.9, 5.6, 40.0, 33.7],
[78.9, 39.3, 11.3, 63.6, 56.7],
[11.4, 75.7, 78.4, 88.7, 58.6],
[79.6, 32.3, 35.3, 52.5, 13.3]])
for i in np.unique(ids): # loop through all IDs
mask = ids == i # find entries that match current ID
count = np.sum(mask) # number of matches
total = np.sum(values[mask]) # values of matches
print('{}: ({}, {:.1f})'.format(i, count, total)) #print result
# Output:
# 1: (8, 336.8)
# 2: (9, 453.4)
# 5: (8, 402.4)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With