How can I use the same column as used in 'values' for 'column' or 'index'?
For example:
pd.pivot_table(data, values='Survived', index=['Survived', 'Sex', 'Pclass'],
aggfunc=len, margins=True)
values and index use the same column Survived. When I try to run the above I get
ValueError: Grouper for 'Survived' not 1-dimensional
However, if instead of values='Survived' I use another column, the pivot_table works fine.
How to group data using index in a pivot table? pivot_table requires a data and an index parameter. data is the Pandas dataframe you pass to the function. index is the feature that allows you to group your data. The index feature will appear as an index in the resultant table.
Index is like an address, that's how any data point across the dataframe or series can be accessed. Rows and columns both have indexes, rows indices are called as index and for columns its general column names. Pandas have three data structures dataframe, series & panel.
An index on a Pandas DataFrame gives us a way to identify rows. Identifying rows by a “label” is arguably better than identifying a row by number. If you only have the integer position to work with, you have to remember the number for each row.
Using the Index custom calculation gives you a picture of each value's importance in its row and column context. If all values in the pivot table were equal, each value would have an index of 1. If an index is greater than 1, it's of greater importance than other items in its row and column.
One issue I'm seeing is that you haven't set the columns
argument when calling pivot_table
(which tells pandas what values to use as the column headers for the pivot_table
output).
A pivot table operation is actually a succession of groupby -> aggregate -> unstack
. Say you have this DataFrame
:
survived sex pclass other
0 False f a 29
1 True f b 6
2 True f b 22
3 False m b 55
4 False f a 59
.. ... .. ... ...
95 False f a 66
96 False f c 42
97 True m c 93
98 True m c 59
99 False f b 93
You can pivot this table using pivot_table
:
pd.pivot_table(df, index='sex', columns='pclass', values='other', aggfunc=sum)
pclass a b c
sex
f 1000 840 306
m 728 851 1247
Or you can get the same result using groupby
and unstack
:
df.groupby(['sex', 'pclass'])['other'].sum().unstack()
pclass a b c
sex
f 1000 840 306
m 728 851 1247
The point of this short story is that pivot tables are actually groupby
operations. In your case, you're trying to group by ['Survived', 'Sex', 'Pclass']
and aggregate 'Survived'
again using len
. That doesn't make much sense since 'Survived'
is already part of output table index (which is why pivot_table
gives you an error).
You can, if you really want to make this work, use groupby
instead:
df.groupby(['survived', 'sex', 'pclass', 'other']['survived'].apply(len).unstack()
However, I think you actually want to achieve something else, not sure what though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With