Pandas: group by index value, then calculate quantile?

Tags:

I have a DataFrame indexed on the month column (set using df = df.set_index('month'), in case that's relevant):

             org_code  ratio_cost    month 2010-08-01   1847      8.685939      2010-08-01   1848      7.883951      2010-08-01   1849      6.798465      2010-08-01   1850      7.352603      2010-09-01   1847      8.778501

I want to add a new column called quantile, which will assign a quantile value to each row, based on the value of its ratio_cost for that month.

So the example above might look like this:

Click to copy

             org_code  ratio_cost   quantile month 2010-08-01   1847      8.685939     100  2010-08-01   1848      7.883951     66.6  2010-08-01   1849      6.798465     0   2010-08-01   1850      7.352603     33.3 2010-09-01   1847      8.778501     100

How can I do this? I've tried this:

Click to copy

df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=True)

But I get KeyError: 'month'.

UPDATE: I can reproduce the bug.

Here is my CSV file: http://pastebin.com/raw/6xbjvEL0

And here is the code to reproduce the error:

Click to copy

df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile']

I'm using Pandas 0.17.1 on OSX.

617

asked Jan 28 '16 11:01

Richard

1 Answers

You have to sort_index before rank:

Click to copy

import pandas as pd  df = pd.read_csv('http://pastebin.com/raw/6xbjvEL0')  df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month')  df = df.sort_index()  df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile'].head()  month 2010-08-01    0.2500 2010-08-01    0.6875 2010-08-01    0.6250 2010-08-01    0.9375 2010-08-01    0.7500 Name: percentile, dtype: float64

163

answered Sep 21 '22 00:09

jezrael

Related questions
                            
                                Suggested way to run multiple sql statements in python?
                            
                                Sqlite. How to get value of Auto Increment Primary Key after Insert, other than last_insert_rowid()?
                            
                                Adding attributes to instancemethods in Python
                            
                                Why is set_xlim() not setting the x-limits in my figure?
                            
                                What is the equivalent of python any() and all() functions in JavaScript?
                            
                                pandas distinction between str and object types
                            
                                Using pathlib's relative_to for directories on the same level
                            
                                Why does popping from the original list make reversed(original_list) empty?
                            
                                Python - OpenCV - imread - Displaying Image
                            
                                In Python, is there an async equivalent to multiprocessing or concurrent.futures?
                            
                                .ini file load environment variable
                            
                                How can I "unpivot" specific columns from a pandas DataFrame?
                            
                                How to run a Jupyter notebook with Python code automatically on a daily basis?
                            
                                Is there a way to attach a debugger to a multi-threaded Python process?
                            
                                Browser-based application or stand-alone GUI app?
                            
                                Check for mutability in Python?
                            
                                How can I achieve a self-referencing many-to-many relationship on the SQLAlchemy ORM back referencing to the same attribute?
                            
                                Python class inheritance: AttributeError: '[SubClass]' object has no attribute 'xxx'
                            
                                Beginner Python: Reading and writing to the same file
                            
                                Efficient element-wise multiplication of a matrix and a vector in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: group by index value, then calculate quantile?

Tags:

python

pandas

dataframe

Richard

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us