So lets say i have a numpy array that holds points in 2d space, like the following
np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]])
I also have a numpy array that labels each point to a number, this array is a 1d array with the length as the number of points in the point array.
np.array([0, 1, 1, 0, 2, 1])
Now i want to take the mean value of each point that have an index from the labels array. So for all points that have label 0, take the mean value of those points. My current way of solving this is the following way
return np.array([points[labels==k].mean(axis=0) for k in range(k)])
where k is the largest number in the labels array, or as it's called the number of ways to label the points.
I would like a way to do this without using a for loop, maybe some numpy functionality i haven't discovered yet?
__array_interface__ A dictionary of items (3 required and 5 optional). The optional keys in the dictionary have implied defaults if they are not provided. The keys are: shape (required) Tuple whose elements are the array size in each dimension.
all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.
Approach #1 : We can leverage matrix-multiplication
with some help from braodcasting
-
mask = labels == np.arange(labels.max()+1)[:,None]
out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]
Sample run -
In [36]: points = np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]])
...: labels = np.array([0, 1, 1, 0, 2, 1])
# Original soln
In [37]: L = labels.max()+1
In [38]: np.array([points[labels==k].mean(axis=0) for k in range(L)])
Out[38]:
array([[3.5 , 2. ],
[6. , 4.33333333],
[4. , 6. ]])
# Proposed soln
In [39]: mask = labels == np.arange(labels.max()+1)[:,None]
...: out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]
In [40]: out
Out[40]:
array([[3.5 , 2. ],
[6. , 4.33333333],
[4. , 6. ]])
Approach #2 : With np.add.at
-
sums = np.zeros((labels.max()+1,points.shape[1]),dtype=float)
np.add.at(sums,labels,points)
out = sums/np.bincount(labels).astype(float)[:,None]
Approach #3 : If all numbers from the sequence in 0 to max-label are present in labels
, we can also use np.add.reduceat
-
sidx = labels.argsort()
sorted_points = points[sidx]
sums = np.add.reduceat(sorted_points,np.r_[0,np.bincount(labels)[:-1].cumsum()])
out = sums/np.bincount(labels).astype(float)[:,None]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With