Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: group by index value, then calculate quantile?

I have a DataFrame indexed on the month column (set using df = df.set_index('month'), in case that's relevant):

             org_code  ratio_cost    month 2010-08-01   1847      8.685939      2010-08-01   1848      7.883951      2010-08-01   1849      6.798465      2010-08-01   1850      7.352603      2010-09-01   1847      8.778501      

I want to add a new column called quantile, which will assign a quantile value to each row, based on the value of its ratio_cost for that month.

So the example above might look like this:

             org_code  ratio_cost   quantile month 2010-08-01   1847      8.685939     100  2010-08-01   1848      7.883951     66.6  2010-08-01   1849      6.798465     0   2010-08-01   1850      7.352603     33.3 2010-09-01   1847      8.778501     100 

How can I do this? I've tried this:

df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=True) 

But I get KeyError: 'month'.

UPDATE: I can reproduce the bug.

Here is my CSV file: http://pastebin.com/raw/6xbjvEL0

And here is the code to reproduce the error:

df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile'] 

I'm using Pandas 0.17.1 on OSX.

like image 617
Richard Avatar asked Jan 28 '16 11:01

Richard


People also ask

How do pandas calculate quantile?

Pandas DataFrame quantile() MethodThe quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.

Can you Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

How do you get index after Groupby pandas?

Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.


1 Answers

You have to sort_index before rank:

import pandas as pd  df = pd.read_csv('http://pastebin.com/raw/6xbjvEL0')  df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month')  df = df.sort_index()  df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile'].head()  month 2010-08-01    0.2500 2010-08-01    0.6875 2010-08-01    0.6250 2010-08-01    0.9375 2010-08-01    0.7500 Name: percentile, dtype: float64 
like image 163
jezrael Avatar answered Sep 21 '22 00:09

jezrael