I have a DataFrame indexed on the month column (set using df = df.set_index('month'), in case that's relevant): 
             org_code  ratio_cost    month 2010-08-01   1847      8.685939      2010-08-01   1848      7.883951      2010-08-01   1849      6.798465      2010-08-01   1850      7.352603      2010-09-01   1847      8.778501        I want to add a new column called quantile, which will assign a quantile value to each row, based on the value of its ratio_cost for that month. 
So the example above might look like this:
             org_code  ratio_cost   quantile month 2010-08-01   1847      8.685939     100  2010-08-01   1848      7.883951     66.6  2010-08-01   1849      6.798465     0   2010-08-01   1850      7.352603     33.3 2010-09-01   1847      8.778501     100   How can I do this? I've tried this:
df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=True)   But I get KeyError: 'month'. 
UPDATE: I can reproduce the bug.
Here is my CSV file: http://pastebin.com/raw/6xbjvEL0
And here is the code to reproduce the error:
df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile']   I'm using Pandas 0.17.1 on OSX.
Pandas DataFrame quantile() MethodThe quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.
You have to sort_index before rank:
import pandas as pd  df = pd.read_csv('http://pastebin.com/raw/6xbjvEL0')  df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month')  df = df.sort_index()  df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile'].head()  month 2010-08-01    0.2500 2010-08-01    0.6875 2010-08-01    0.6250 2010-08-01    0.9375 2010-08-01    0.7500 Name: percentile, dtype: float64 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With