Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binning pandas data by top N percent

Tags:

python

pandas

I have a pandas series (as part of a larger data frame) like the below:

0        7416
1       10630
2        7086
3        2091
4        3995
5        1304
6         519
7        1262
8        3676
9        2371
10       5346
11        912
12       3653
13       1093
14       2986
15       2951
16      11859

I would like to group rows based on the following quantiles:

Top 0-5%
Top 6-10%
Top 11-25%
Top 26-50%
Top 51-75%
Top 76-100%

First I started by using pd.rank() on the data and then I planned on then using pd.cut() to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.

like image 936
metersk Avatar asked Dec 09 '15 16:12

metersk


People also ask

How do you get top 5 values in pandas?

Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().

How do you binning in pandas?

In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.

What does QCUT do in pandas?

By default, the size of each bin is the same (approximately) and it is the difference between the lower and upper bin edge. The qcut function focuses on the number of values in each bin. The values are sorted from the smallest to largest.


1 Answers

Is this what you had in mind?

pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1])
like image 50
crow_t_robot Avatar answered Oct 11 '22 23:10

crow_t_robot