Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to plot dataframe using seaborn barplot

I have been able to use pandas groupby to create a new DataFrame but I'm getting an error when I create a barplot. The groupby command:

invYr = invoices.groupby(['FinYear']).sum()[['Amount']]

Which creates a new DataFrame that looks correct to me.

New DataFrame invYr

Running:

sns.barplot(x='FinYear', y='Amount', data=invYr)

I get the error:

ValueError: Could not interperet input 'FinYear'

It appears that the issue is related to the index, being FinYear but unfortunately I have not been able to solve the issue even when using reindex.

like image 574
sams Avatar asked Feb 03 '16 03:02

sams


1 Answers

import pandas as pd
import seaborn as sns

invoices = pd.DataFrame({'FinYear': [2015, 2015, 2014], 'Amount': [10, 10, 15]})
invYr = invoices.groupby(['FinYear']).sum()[['Amount']]

>>> invYr
         Amount
FinYear        
2014         15
2015         20

The reason that you are getting the error is that when you created invYr by grouping invoices, the FinYear column becomes the index and is no longer a column. There are a few solutions:

1) One solution is to specify the source data directly. You need to specify the correct datasource for the chart. If you do not specify a data parameter, Seaborn does not know which dataframe/series has the columns 'FinYear' or 'Amount' as these are just text values. You must specify, for example, y=invYr.Amount to specify both the dataframe/series and the column you'd like to graph. The trick here is directly accessing the index of the dataframe.

sns.barplot(x=invYr.index, y=invYr.Amount)

2) Alternatively, you can specify the data source and then directly refer to its columns. Note that the grouped data frame had its index reset so that the column again becomes available.

sns.barplot(x='FinYear', y='Amount', data=invYr.reset_index())

3) A third solution is to specify as_index=False when you perform the groupby, making the column available in the grouped dataframe.

invYr = invoices.groupby('FinYear', as_index=False).Amount.sum()
sns.barplot(x='FinYear', y='Amount', data=invYr)

All solutions above produce the same plot below.

enter image description here

like image 89
Alexander Avatar answered Oct 14 '22 19:10

Alexander