Numpy split array based on condition without for loop

Tags:

So lets say i have a numpy array that holds points in 2d space, like the following

np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]])

I also have a numpy array that labels each point to a number, this array is a 1d array with the length as the number of points in the point array.

np.array([0, 1, 1, 0, 2, 1])

Now i want to take the mean value of each point that have an index from the labels array. So for all points that have label 0, take the mean value of those points. My current way of solving this is the following way

return np.array([points[labels==k].mean(axis=0) for k in range(k)])

where k is the largest number in the labels array, or as it's called the number of ways to label the points.

I would like a way to do this without using a for loop, maybe some numpy functionality i haven't discovered yet?

827

asked Feb 26 '19 16:02

Shadesfear

1 Answers

Approach #1 : We can leverage matrix-multiplication with some help from braodcasting -

mask = labels == np.arange(labels.max()+1)[:,None]
out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

Sample run -

In [36]: points = np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]]) 
    ...: labels = np.array([0, 1, 1, 0, 2, 1])

# Original soln
In [37]: L = labels.max()+1

In [38]: np.array([points[labels==k].mean(axis=0) for k in range(L)])
Out[38]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

# Proposed soln
In [39]: mask = labels == np.arange(labels.max()+1)[:,None]
    ...: out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

In [40]: out
Out[40]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

Approach #2 : With np.add.at -

sums = np.zeros((labels.max()+1,points.shape[1]),dtype=float)
np.add.at(sums,labels,points)
out = sums/np.bincount(labels).astype(float)[:,None]

Approach #3 : If all numbers from the sequence in 0 to max-label are present in labels, we can also use np.add.reduceat -

sidx = labels.argsort()
sorted_points = points[sidx]
sums = np.add.reduceat(sorted_points,np.r_[0,np.bincount(labels)[:-1].cumsum()])
out = sums/np.bincount(labels).astype(float)[:,None]

107

answered Oct 05 '22 23:10

Divakar

Related questions
                            
                                Change docker-compose propagation settings
                            
                                Can a Python class be written such it may be passed to write()?
                            
                                How do we script/automate an Electron app with Puppeteer?
                            
                                Implementing a DAG in python
                            
                                Global event handler for AJAX POST requests in jQuery
                            
                                I get unsafe URL when trying to view video using its path
                            
                                How to format properly date-time column in R using mutate?
                            
                                Install docker and docker-compose on Azure using cloud-Init
                            
                                What's the difference between / advantage of using JUnit 5's @ParametrizedTest over @TestFactory Stream<DynamicTest>?
                            
                                Customize colors in "git gui"?
                            
                                String Templating for object keys
                            
                                I disable IE enhanced security but IE not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With