I have a DataFrame indexed on the month
column (set using df = df.set_index('month')
, in case that's relevant):
org_code ratio_cost month 2010-08-01 1847 8.685939 2010-08-01 1848 7.883951 2010-08-01 1849 6.798465 2010-08-01 1850 7.352603 2010-09-01 1847 8.778501
I want to add a new column called quantile
, which will assign a quantile value to each row, based on the value of its ratio_cost
for that month.
So the example above might look like this:
org_code ratio_cost quantile month 2010-08-01 1847 8.685939 100 2010-08-01 1848 7.883951 66.6 2010-08-01 1849 6.798465 0 2010-08-01 1850 7.352603 33.3 2010-09-01 1847 8.778501 100
How can I do this? I've tried this:
df['quantile'] = df.groupby('month')['ratio_cost'].rank(pct=True)
But I get KeyError: 'month'
.
UPDATE: I can reproduce the bug.
Here is my CSV file: http://pastebin.com/raw/6xbjvEL0
And here is the code to reproduce the error:
df = pd.read_csv('temp.csv') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile']
I'm using Pandas 0.17.1 on OSX.
Pandas DataFrame quantile() MethodThe quantile() method calculates the quantile of the values in a given axis. Default axis is row. By specifying the column axis ( axis='columns' ), the quantile() method calculates the quantile column-wise and returns the mean value for each row.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.
You have to sort_index
before rank
:
import pandas as pd df = pd.read_csv('http://pastebin.com/raw/6xbjvEL0') df.month = pd.to_datetime(df.month, unit='s') df = df.set_index('month') df = df.sort_index() df['percentile'] = df.groupby(df.index)['ratio_cost'].rank(pct=True) print df['percentile'].head() month 2010-08-01 0.2500 2010-08-01 0.6875 2010-08-01 0.6250 2010-08-01 0.9375 2010-08-01 0.7500 Name: percentile, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With