Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

average of all rows corresponing to all unique rows

Tags:

python

numpy

I have a numpy array with two columns:

A = [[1,1,1,2,3,1,2,3],[0.1,0.2,0.2,0.1,0.3,0.2,0.2,0.1]]

for all uniques in first column, I want average of the values corresponding to it. For example

B = [[1,2,3], [0.175, 0.15, 0.2]]

Is there a pythonic way to do this?

like image 517
Abhishek Thakur Avatar asked Mar 21 '23 07:03

Abhishek Thakur


1 Answers

I think the following is the standard numpy approach for these kind of computations. The call to np.unique can be skipped if the entries of A[0] are small integers, but it makes the whole operation more robust and independent of the actual data.

>>> A = [[1,1,1,2,3,1,2,3],[0.1,0.2,0.2,0.1,0.3,0.2,0.2,0.1]]
>>> unq, unq_idx = np.unique(A[0], return_inverse=True)
>>> unq_sum = np.bincount(unq_idx, weights=A[1])
>>> unq_counts = np.bincount(unq_idx)
>>> unq_avg = unq_sum / unq_counts
>>> unq
array([1, 2, 3])
>>> unq_avg
array([ 0.175,  0.15 ,  0.2  ])

You could of course then stack both arrays, although that will convert unq to float dtype:

>>> np.vstack((unq, unq_avg))
array([[ 1.   ,  2.   ,  3.   ],
       [ 0.175,  0.15 ,  0.2  ]])
like image 155
Jaime Avatar answered Apr 02 '23 17:04

Jaime