Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Could not interpret input' error with Seaborn when plotting groupbys

Say I have this dataframe

d = {     'Path'   : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
          'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
          'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
          'Value'  : [30, 20, 10, 40, 40, 50],
          'Field'  : [50, 70, 10, 20, 30, 30] }


df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df

               Field Program  Value
Path Detail                      
abc  foo        50   prog1     30
     bar        70   prog1     20
ghi  bar        10   prog1     10
     foo        20   prog2     40
jkl  foo        30   prog3     40
     foo        30   prog3     50

I can aggregate it no problem (if there's a better way to do this, by the way, I'd like to know!)

df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count

Program   Value
prog1    3
prog3    2
prog2    1

df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Program  Value
prog3    45
prog2    40
prog1    20

I can plot it from Pandas no problem...

df_mean.plot(kind='bar')

But why do I get this error when I try it in seaborn?

sns.factorplot('Program',data=df_mean)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
   2673     # facets to ensure representation of all data in the final plot
   2674     p = _CategoricalPlotter()
-> 2675     p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
   2676     order = p.group_names
   2677     hue_order = p.hue_names

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    143                 if isinstance(input, string_types):
    144                     err = "Could not interperet input '{}'".format(input)
--> 145                     raise ValueError(err)
    146 
    147             # Figure out the plotting orientation

ValueError: Could not interperet input 'Program'
like image 214
marshallbanana Avatar asked Oct 02 '15 13:10

marshallbanana


1 Answers

The reason for the exception you are getting is that Program becomes an index of the dataframes df_mean and df_count after your group_by operation.

If you wanted to get the factorplot from df_mean, an easy solution is to add the index as a column,

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)

However you could even more simply let factorplot do the calculations for you,

sns.factorplot(x='Program', y='Value', data=df)

You'll obtain the same result.

EDIT after comments

Indeed you make a very good point about the parameter as_index; by default it is set to True, and in that case Program becomes part of the index, as in your question.

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20

Just to be clear, this way Program is not column anymore, but it becomes the index. the trick df_mean['Program'] = df_mean.index actually keeps the index as it is, and adds a new column for the index, so that Program is duplicated now.

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1

However, if you set as_index to False, you get Program as a column, plus a new autoincrement index,

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20

This way you could feed it directly to seaborn. Still, you could use df and get the same result.

like image 117
lrnzcig Avatar answered Oct 15 '22 02:10

lrnzcig