Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.
What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.
I've tried this:
grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES') grouped.boxplot()
but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.
In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations. Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object.
Will not work, as boxplot is a function for a DataFrame and not a Series. While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from.
A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results. After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data.
Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend. New in version 1.0.0. All other plotting keyword arguments to be passed to matplotlib’s boxplot function.
this code:
data['2013-08-17'].boxplot(by='SPECIES')
Will not work, as boxplot is a function for a DataFrame and not a Series.
While in Pandas > 0.18.1 the boxplot function has the argument columns
which defines from what column the data is taken from.
So
data.boxplot(column='2013-08-17',by='SPECIES')
should return the desired result.
An example with the Iris dataset:
import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/io/data/csv/iris.csv') fig, ax = plt.subplots(figsize=(10,8)) plt.suptitle('') data.boxplot(column=['SepalLength'], by='Name', ax=ax)
creates:
plt.suptitle('')
turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so
data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)
also works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With