Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging records in a Numpy structured array

I have a Numpy structured array that is sorted by the first column:

x = array([(2, 3), (2, 8), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

I need to merge records (sum the values of the second column) where

x[n][0] == x[n + 1][0]

In this case, the desired output would be:

x = array([(2, 11), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

What's the best way to achieve this?

like image 741
krlk89 Avatar asked Feb 26 '26 15:02

krlk89


2 Answers

You can use np.unique to get an ID array for each element in the first column and then use np.bincount to perform accumulation on the second column elements based on the IDs -

In [140]: A
Out[140]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [141]: unqA,idx = np.unique(A[:,0],return_inverse=True)

In [142]: np.column_stack((unqA,np.bincount(idx,A[:,1])))
Out[142]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])

You can avoid np.unique with a combination of np.diff and np.cumsum which might help because np.unique also does sorting internally, which is not needed in this case as the input data is already sorted. The implementation would look something like this -

In [201]: A
Out[201]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [202]: unq1 = np.append(True,np.diff(A[:,0])!=0)

In [203]: np.column_stack((A[:,0][unq1],np.bincount(unq1.cumsum()-1,A[:,1])))
Out[203]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])
like image 184
Divakar Avatar answered Mar 01 '26 05:03

Divakar


Dicakar's answer cast in structured array form:

In [500]: x=np.array([(25, 1), (37, 3), (37, 2), (47, 1), (59, 2)], dtype=[('recod', '<u8'), ('count', '<u4')])

Find unique values and count duplicates:

In [501]: unqA, idx=np.unique(x['recod'], return_inverse=True)    
In [502]: cnt = np.bincount(idx, x['count'])

Make a new structured array and fill the fields:

In [503]: x1 = np.empty(unqA.shape, dtype=x.dtype)
In [504]: x1['recod'] = unqA
In [505]: x1['count'] = cnt

In [506]: x1
Out[506]: 
array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

There is a recarray function that builds an array from a list of arrays:

In [507]: np.rec.fromarrays([unqA,cnt],dtype=x.dtype)
Out[507]: 
rec.array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

Internally it does the same thing - build an empty array of the right size and dtype, and then loop over over the dtype fields. A recarray is just a structured array in a specialized array subclass wrapper.

There are two ways of populating a structured array (especially with a diverse dtype) - with a list of tuples as you did with x, and field by field.

like image 40
hpaulj Avatar answered Mar 01 '26 04:03

hpaulj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!