I have a pandas series (as part of a larger data frame) like the below: <pre class="prettyprint"><code>0 7416 1 10630 2 7086 3 2091 4 3995 5 1304 6 519 7 1262 8 3676 9 2371 10 5346 11 912 12 3653 13 1093 14 2986 15 2951 16 11859 </code></pre> I would like to group rows based on the following quantiles: <pre class="prettyprint"><code>Top 0-5% Top 6-10% Top 11-25% Top 26-50% Top 51-75% Top 76-100% </code></pre> First I started by using <code>pd.rank()</code> on the data and then I planned on then using <code>pd.cut()</code> to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.

Is this what you had in mind? <pre class="prettyprint"><code>pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1]) </code></pre>

Binning pandas data by top N percent

Tags:

pandas

I have a pandas series (as part of a larger data frame) like the below:

0        7416
1       10630
2        7086
3        2091
4        3995
5        1304
6         519
7        1262
8        3676
9        2371
10       5346
11        912
12       3653
13       1093
14       2986
15       2951
16      11859

I would like to group rows based on the following quantiles:

Top 0-5%
Top 6-10%
Top 11-25%
Top 26-50%
Top 51-75%
Top 76-100%

First I started by using pd.rank() on the data and then I planned on then using pd.cut() to cut the data into bins, but it does not seem like this accepts top N%, rather it accepts explicit bin edges. Is there an easy way to do this in pandas, or do I need to create a lambda/apply function which calculates which bin each of the ranked items should be placed in.

936

asked Dec 09 '15 16:12

metersk

1 Answers

Is this what you had in mind?

pd.qcut(data, [0.05, 0.1, 0.25, 0.5, 0.75, 1])

answered Oct 11 '22 23:10

crow_t_robot

Related questions
                            
                                Get column names Dynamically with SQLAlchemy
                            
                                Find min, max and average of an ID in Python Pandas
                            
                                Got 'No such file or directory' error while configuring nginx and uwsgi
                            
                                Python: Assign variables from array
                            
                                Converting Pandas Timestamp to just the time (looking for something faster than .apply)
                            
                                What does a return do when using a "yield from" expression?
                            
                                `object.__setattr__(self, ..., ...)` instead of `setattr(self, ..., ...)`?
                            
                                Python 3.5 HookManager SystemError: PyEval_EvalFrameEx
                            
                                Cython speedup isn't as large as expected
                            
                                numpy, how to generate a normally distributed set of integers
                            
                                Converting a list of list into a dictionary
                            
                                Rendering to JS with Jinja produces invalid number rather than string
                            
                                How does python handle thread locking / context switching?
                            
                                How can I change User_AGENT in scrapy spider?
                            
                                How can I view/get scrapy POST/GET request headers
                            
                                How to access keys from buckets with periods (.) in their names using boto3?
                            
                                Convert 4-bit integer into Boolean list
                            
                                Python - Using pyodbc to connect to remote server using info from Excel data connection
                            
                                How to capture and read headers of incoming HTTP requests in Flask? [duplicate]
                            
                                DateField 'str' object has no attribute 'year'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With