Pandas: groupby and get median value by month?

Question

I have a data frame that looks like this:

     org        date     value
0    00C  2013-04-01  0.092535
1    00D  2013-04-01  0.114941
2    00F  2013-04-01  0.102794
3    00G  2013-04-01  0.099421
4    00H  2013-04-01  0.114983

Now I want to figure out:

the median value for each organisation in each month of the year
X for each organisation, where X is the percentage difference between the lowest median monthly value, and the highest median value.

What's the best way to approach this in Pandas?

I am trying to generate the medians by month as follows, but it's failing:

df['date'] = pd.to_datetime(df['date'])
ave = df.groupby(['row_id', 'date.month']).median()

This fails with KeyError: 'date.month'.

UPDATE: Thanks to @EdChum I'm now doing:

ave = df.groupby([df['row_id'], df['date'].dt.month]).median()

which works great and gives me:

99P    1     0.106975
       2     0.091344
       3     0.098958
       4     0.092400
       5     0.087996
       6     0.081632
       7     0.083592
       8     0.075258
       9     0.080393
       10    0.089634
       11    0.085679
       12    0.108039
99Q    1     0.110889
       2     0.094837
       3     0.100658
       4     0.091641
       5     0.088971
       6     0.083329
       7     0.086465
       8     0.078368
       9     0.082947
       10    0.090943
       11    0.086343
       12    0.109408

Now I guess, for each item in the index, I need to find the min and max calculated values, then the difference between them. What is the best way to do that?

EdChum · Accepted Answer

For your first error you have a syntax error, you can pass a list of the column names or the columns themselves, you passed a list of names and date.month doesn't exist so you want:

ave = df.groupby([df['row_id'], df['date'].dt.month]).median()

After that you can calculate the min and max for a specific index level so:

((ave.max(level=0) - ave.min(level=0))/ave.max(level=0)) * 100

should give you what you want.

This calculates the difference between the min and max value for each organisation, divides by the max at that level and creates the percentage by multiplying by 100

Pandas: groupby and get median value by month?

Tags:

python

pandas

Richard

1 Answers

EdChum

Recent Activity

Donate For Us

Pandas: groupby and get median value by month?

Tags:

python

pandas

Richard

1 Answers

EdChum

Related questions

Recent Activity

Donate For Us