Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot table value as column or index

Tags:

pandas

How can I use the same column as used in 'values' for 'column' or 'index'?

For example:

pd.pivot_table(data, values='Survived', index=['Survived', 'Sex', 'Pclass'],
               aggfunc=len, margins=True)

values and index use the same column Survived. When I try to run the above I get

ValueError: Grouper for 'Survived' not 1-dimensional

However, if instead of values='Survived' I use another column, the pivot_table works fine.

like image 893
tadalendas Avatar asked Mar 14 '16 21:03

tadalendas


People also ask

What is index in pivot table pandas?

How to group data using index in a pivot table? pivot_table requires a data and an index parameter. data is the Pandas dataframe you pass to the function. index is the feature that allows you to group your data. The index feature will appear as an index in the resultant table.

Does index count as column pandas?

Index is like an address, that's how any data point across the dataframe or series can be accessed. Rows and columns both have indexes, rows indices are called as index and for columns its general column names. Pandas have three data structures dataframe, series & panel.

When should I use pandas index?

An index on a Pandas DataFrame gives us a way to identify rows. Identifying rows by a “label” is arguably better than identifying a row by number. If you only have the integer position to work with, you have to remember the number for each row.

Can you index a pivot table?

Using the Index custom calculation gives you a picture of each value's importance in its row and column context. If all values in the pivot table were equal, each value would have an index of 1. If an index is greater than 1, it's of greater importance than other items in its row and column.


1 Answers

One issue I'm seeing is that you haven't set the columns argument when calling pivot_table (which tells pandas what values to use as the column headers for the pivot_table output).

A pivot table operation is actually a succession of groupby -> aggregate -> unstack. Say you have this DataFrame:

    survived sex pclass  other
0      False   f      a     29
1       True   f      b      6
2       True   f      b     22
3      False   m      b     55
4      False   f      a     59
..       ...  ..    ...    ...
95     False   f      a     66
96     False   f      c     42
97      True   m      c     93
98      True   m      c     59
99     False   f      b     93

You can pivot this table using pivot_table:

pd.pivot_table(df, index='sex', columns='pclass', values='other', aggfunc=sum)
pclass     a    b     c
sex                    
f       1000  840   306
m        728  851  1247

Or you can get the same result using groupby and unstack:

df.groupby(['sex', 'pclass'])['other'].sum().unstack()
pclass     a    b     c
sex                    
f       1000  840   306
m        728  851  1247

The point of this short story is that pivot tables are actually groupby operations. In your case, you're trying to group by ['Survived', 'Sex', 'Pclass'] and aggregate 'Survived' again using len. That doesn't make much sense since 'Survived' is already part of output table index (which is why pivot_table gives you an error).

You can, if you really want to make this work, use groupby instead:

df.groupby(['survived', 'sex', 'pclass', 'other']['survived'].apply(len).unstack()

However, I think you actually want to achieve something else, not sure what though.

like image 160
hbot Avatar answered Nov 15 '22 11:11

hbot