Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.

enter image description here

What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.

I've tried this:

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES') grouped.boxplot() 

but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.

like image 324
pbreach Avatar asked Aug 28 '13 21:08

pbreach


People also ask

What is multi-index in pandas Dataframe and groupby?

In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations. Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object.

Is it possible to use boxplot in a Dataframe?

Will not work, as boxplot is a function for a DataFrame and not a Series. While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from.

What is a groupby operation in pandas?

A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results. After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data.

How to specify the plotting backend for Matplotlib boxplot?

Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend. New in version 1.0.0. All other plotting keyword arguments to be passed to matplotlib’s boxplot function.


1 Answers

this code:

data['2013-08-17'].boxplot(by='SPECIES') 

Will not work, as boxplot is a function for a DataFrame and not a Series.

While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from.

So

data.boxplot(column='2013-08-17',by='SPECIES') 

should return the desired result.

An example with the Iris dataset:

import pandas as pd import matplotlib.pyplot as plt  data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/iris.csv') fig, ax = plt.subplots(figsize=(10,8)) plt.suptitle('') data.boxplot(column=['SepalLength'], by='Name', ax=ax) 

creates:

Boxplot iris dataset with pandas

plt.suptitle('')  

turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so

data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax) 

also works.

like image 131
schlump Avatar answered Oct 11 '22 14:10

schlump