Split an array into bins of equal numbers

Tags:

I have an array (not sorted) of N elements. I'd like to keep the original order of N, but instead of the actual elements, I'd like them to have their bin numbers, where N is split into m bins of equal (if N is divisible by m) or nearly equal (N not divisible by m) values. I need a vectorized solution (since N is fairly large, so standard python methods won't be efficient). Is there anything in scipy or numpy that can do this?

Click to copy

e.g.
N = [0.2, 1.5, 0.3, 1.7, 0.5]
m = 2
Desired output: [0, 1, 0, 1, 0]

I've looked at numpy.histogram, but it doesn't give me unequally spaced bins.

635

asked Nov 30 '16 17:11

max_max_mir

2 Answers

Listed in this post is a NumPy based vectorized approach with the idea of creating equally spaced indices for the length of the input array using np.searchsorted - Here's the implementation -

Click to copy

def equal_bin(N, m):
    sep = (N.size/float(m))*np.arange(1,m+1)
    idx = sep.searchsorted(np.arange(N.size))
    return idx[N.argsort().argsort()]

Sample runs with bin-counting for each bin to verify results -

Click to copy

In [442]: N = np.arange(1,94)

In [443]: np.bincount(equal_bin(N, 4))
Out[443]: array([24, 23, 23, 23])

In [444]: np.bincount(equal_bin(N, 5))
Out[444]: array([19, 19, 18, 19, 18])

In [445]: np.bincount(equal_bin(N, 10))
Out[445]: array([10,  9,  9, 10,  9,  9, 10,  9,  9,  9])

Here's another approach using linspace to create those equally spaced numbers that could be used as indices, like so -

Click to copy

def equal_bin_v2(N, m):
    idx = np.linspace(0,m,N.size+0.5, endpoint=0).astype(int)
    return idx[N.argsort().argsort()]

Sample run -

Click to copy

In [689]: N
Out[689]: array([ 0.2,  1.5,  0.3,  1.7,  0.5])

In [690]: equal_bin_v2(N,2)
Out[690]: array([0, 1, 0, 1, 0])

In [691]: equal_bin_v2(N,3)
Out[691]: array([0, 1, 0, 2, 1])

In [692]: equal_bin_v2(N,4)
Out[692]: array([0, 2, 0, 3, 1])

In [693]: equal_bin_v2(N,5)
Out[693]: array([0, 3, 1, 4, 2])

148

answered Oct 05 '22 23:10

Divakar

pandas.qcut

Another good alternative is the pd.qcut from pandas. For example:

Click to copy

In [6]: import pandas as pd
In [7]: N = [0.2, 1.5, 0.3, 1.7, 0.5]
   ...: m = 2

In [8]: pd.qcut(N, m, labels=False)
Out[8]: array([0, 1, 0, 1, 0], dtype=int64)

Tip for getting the bin middle points

If you want to return the bin edges, use labels=True (default). This will allow you to get the bin middle points with:

Click to copy

In [26]: intervals = pd.qcut(N, 2)

In [27]: [i.mid for i in intervals]
Out[27]: [0.34950000000000003, 1.1, 0.34950000000000003, 1.1, 0.34950000000000003]

The intervals is an array of pandas.Interval objects (when labels=True).

See also: pd.cut, if you would like to make the bin width (not bin count) equal

answered Oct 05 '22 23:10

np8

Related questions
                            
                                Find all possible distances from two arrays
                            
                                Exception Error : Access violation reading location 0xDDDDDDDD
                            
                                Initialize huge uint8_t array statically with reasonable compilation time
                            
                                parallel assignment performance in Ruby
                            
                                Intersection of two numpy arrays of different dimensions by column
                            
                                Numpy array exclude some elements
                            
                                can my code improve from using LINQ?
                            
                                Shuffle ordering of some rows in numpy array
                            
                                call a vararg function with an array?
                            
                                sum numpy ndarray with 3d array along a given axis 1
                            
                                Swap integers algorithm
                            
                                Replicating 2 dimensional matrix to create a 3 dimensional array (in R)
                            
                                Adding element to an array while iterating over it
                            
                                Nativeint Bigarray seems to be unsigned
                            
                                Named array element used in function definition
                            
                                Partially sorting an array C
                            
                                Is there a more Pythonic/elegant way to expand the dimensions of a Numpy Array?
                            
                                updating a JSON array in AWS dynamoDB
                            
                                c++ creating a static like array with "new" or another way of creating a dynamic array
                            
                                numpy: broadcast multiplication over one common axis of two 2d arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split an array into bins of equal numbers

Tags:

arrays

vectorization

numpy

scipy

binning

max_max_mir

People also ask

2 Answers

Divakar

pandas.qcut

Tip for getting the bin middle points

np8

Recent Activity

Donate For Us