Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subplots from a multiindex pandas dataframe grouped by level

How I do multiple plot from a multi-indexed pandas DataFrame based on one of the levels of the multiindex?

I have results from a model with different technologies usage in different scenarios, the results could look something like this:

import numpy as np
import pandas as pd
df=pd.DataFrame(abs(np.random.randn(12,4)),columns=[2011,2012,2013,2014])
df['scenario']=['s1','s1','s1','s2','s2','s3','s3','s3','s3','s4','s4','s4']
df['technology'=['t1','t2','t5','t2','t6','t1','t3','t4','t5','t1','t3','t4']
dfg=df.groupby(['scenario','technology']).sum().transpose()

dfg would have the technologies employed each year for each scenario. I would like to have a subplot for each scenario sharing the legend.

If I simply use the argument subplots=True, then it plots all the possible combinations (12 subplots)

dfg.plot(kind='bar',stacked=True,subplots=True)

Based on this response I got closer to what I was looking for.

f,a=plt.subplots(2,2)

fig1=dfg['s1'].plot(kind='bar',ax=a[0,0])

fig2=dfg['s2'].plot(kind='bar',ax=a[0,1])

fig2=dfg['s3'].plot(kind='bar',ax=a[1,0])

fig2=dfg['s3'].plot(kind='bar',ax=a[1,1])

plt.tight_layout()

but the result is not ideal, each subplot has a different legend...and that makes it quite difficult to read. There must be an easier way to do subplots from a multiindexed dataframes... Thanks!

EDIT1: Ted Petrou proposed a nice solution using seaborn factorplot but I have two issues. I already have a style defined and I'd rather not use the seaborn style (one solution could be change the parameters of seaborn). The other problem is that I wanted to use a stacked bar plot, which require considerable extra tweaks. Any chance I can do something similar with Matplotlib?

like image 700
Nabla Avatar asked Jan 23 '17 16:01

Nabla


People also ask

What is multi-index in pandas Dataframe and groupby?

In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations. Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object.

How do I make a multiindex from a Dataframe?

Make a MultiIndex from a DataFrame. DataFrame to be converted to MultiIndex. Level of sortedness (must be lexicographically sorted by that level). If no names are provided, use the column names, or tuple of column names if the columns is a MultiIndex. If a sequence, overwrite names with the given sequence.

What is pandas groupby method?

Pandas groupby method gives rise to several levels of indexes and columns. Pandas is considered an essential tool for any Data Scientists using Python. One commonly used feature is the groupby method. However, those who just transitioned to pandas might find it a little bit confusing, especially if you come from the world of SQL.

How do I find a row using indexes in pandas?

To find a row using indexes, we just pass in the indexes one after another using the .loc method: In SQL, renaming the ‘procedure_name’, ‘count’ and ‘procedure_length ’ column is quite straightforward using the AS statement. Because pandas has index and columns, renaming them is a bit tricky. To rename the indexes:


1 Answers

In my opinion it's easier to do a data analysis when you 'tidy' up your data - making each column represent one variable. Here, you have all 4 years represented in different columns. Pandas has one function and one method to make long(tidy) data from wide(messy) data. You can use df.stack or pd.melt(df) to tidy your data. Then you can take advantage of the excellent seaborn library which expects tidy data to easily plot most anything you want.

Tidy the data

df1 = pd.melt(df, id_vars=['scenario', 'technology'], var_name='year')
print(df1.head())

  scenario technology  year     value
0       s1         t1  2011  0.406830
1       s1         t2  2011  0.495418
2       s1         t5  2011  0.116925
3       s2         t2  2011  0.904891
4       s2         t6  2011  0.525101

Use Seaborn

import seaborn as sns
sns.factorplot(x='year', y='value', hue='technology', 
               col='scenario', data=df1, kind='bar', col_wrap=2,
              sharey=False)

enter image description here

like image 60
Ted Petrou Avatar answered Oct 23 '22 07:10

Ted Petrou