I have a pandas series (as part of a larger data frame) like the below:
0 7416
1 10630
2 7086
3 2091
4 3995
5 1304
6 519
7 1262
8 3676
9 2371
10 5346
11 912
12 3653
13 1093
14 2986
15 2951
16 11859
I would like to group rows based on the following quantiles:
Top 0-5%
Top 6-10%
Top 11-25%
Top 26-50%
Top 51-75%
Top 76-100%
First I started by using pd.rank()
on the data and then I planned on then using pd.cut()
to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.
Python's Pandas module provide easy ways to do aggregation and calculate metrics. Finding Top 5 maximum value for each group can also be achieved while doing the group by. The function that is helpful for finding the Top 5 maximum value is nlargest().
In Python pandas binning by distance is achieved by means of the cut() function. We group values related to the column Cupcake into three groups: small, medium and big. In order to do it, we need to calculate the intervals within each group falls.
By default, the size of each bin is the same (approximately) and it is the difference between the lower and upper bin edge. The qcut function focuses on the number of values in each bin. The values are sorted from the smallest to largest.
Is this what you had in mind?
pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With