Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: groupby and get median value by month?

Tags:

python

pandas

I have a data frame that looks like this:

     org        date     value
0    00C  2013-04-01  0.092535
1    00D  2013-04-01  0.114941
2    00F  2013-04-01  0.102794
3    00G  2013-04-01  0.099421
4    00H  2013-04-01  0.114983

Now I want to figure out:

  • the median value for each organisation in each month of the year
  • X for each organisation, where X is the percentage difference between the lowest median monthly value, and the highest median value.

What's the best way to approach this in Pandas?

I am trying to generate the medians by month as follows, but it's failing:

df['date'] = pd.to_datetime(df['date'])
ave = df.groupby(['row_id', 'date.month']).median()

This fails with KeyError: 'date.month'.

UPDATE: Thanks to @EdChum I'm now doing:

ave = df.groupby([df['row_id'], df['date'].dt.month]).median()

which works great and gives me:

99P    1     0.106975
       2     0.091344
       3     0.098958
       4     0.092400
       5     0.087996
       6     0.081632
       7     0.083592
       8     0.075258
       9     0.080393
       10    0.089634
       11    0.085679
       12    0.108039
99Q    1     0.110889
       2     0.094837
       3     0.100658
       4     0.091641
       5     0.088971
       6     0.083329
       7     0.086465
       8     0.078368
       9     0.082947
       10    0.090943
       11    0.086343
       12    0.109408

Now I guess, for each item in the index, I need to find the min and max calculated values, then the difference between them. What is the best way to do that?

like image 740
Richard Avatar asked Sep 26 '22 09:09

Richard


1 Answers

For your first error you have a syntax error, you can pass a list of the column names or the columns themselves, you passed a list of names and date.month doesn't exist so you want:

ave = df.groupby([df['row_id'], df['date'].dt.month]).median()

After that you can calculate the min and max for a specific index level so:

((ave.max(level=0) - ave.min(level=0))/ave.max(level=0)) * 100

should give you what you want.

This calculates the difference between the min and max value for each organisation, divides by the max at that level and creates the percentage by multiplying by 100

like image 123
EdChum Avatar answered Sep 28 '22 04:09

EdChum