Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grouping by column and then doing a boxplot by the index in pandas

Tags:

pandas

boxplot

I have a large dataframe which I would like to group by some column and examine graphically the distribution per group using a boxplot. I found that df.boxplot() will do it for each column of the dataframe and put it in one plot, just as I need.

The problem is that after a groupby operation, my data is all in one column with the group labels in the index , so i can't call boxplot on the result.

here is an example:

df = DataFrame({'a':rand(10),'b':[x%2 for x in range(10)]})
df

         a   b
0    0.273548    0
1    0.378765    1
2    0.190848    0
3    0.646606    1
4    0.562591    0
5    0.409250    1
6    0.637074    0
7    0.946864    1
8    0.203656    0
9    0.276929    1

Now I want to group by column b and boxplot the distribution of both groups in one boxplot. How can I do that?

like image 325
idoda Avatar asked Dec 19 '13 11:12

idoda


1 Answers

You can use the by argument of boxplot. Is that what you are looking for?

df.boxplot(column='a', by='b')
like image 124
joris Avatar answered Sep 27 '22 17:09

joris