I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. It should be easy.
Here's the challenge.
I have a pandas dataframe:
+--------------------------+ | Col 'X' Col 'Y' | +--------------------------+ | class 1 cat 1 | | class 2 cat 1 | | class 3 cat 2 | | class 2 cat 3 | +--------------------------+
What I am looking to transform the dataframe into:
+------------------------------------------+ | cat 1 cat 2 cat 3 | +------------------------------------------+ | class 1 1 0 0 | | class 2 1 0 1 | | class 3 0 1 0 | +------------------------------------------+
Where the values are value counts. Anybody have any insight? Thanks!
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.
Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here's an example.
Pandas pivot tables can be used in conjunction with the Pandas plotting functionality to create useful data visualisations. Simply adding . plot() to the end of your pivot table code will create a plot of the data.
Here are couple of ways to reshape your data df
In [27]: df Out[27]: Col X Col Y 0 class 1 cat 1 1 class 2 cat 1 2 class 3 cat 2 3 class 2 cat 3
1) Using pd.crosstab()
In [28]: pd.crosstab(df['Col X'], df['Col Y']) Out[28]: Col Y cat 1 cat 2 cat 3 Col X class 1 1 0 0 class 2 1 0 1 class 3 0 1 0
2) Or, use groupby
on 'Col X','Col Y'
with unstack
over Col Y
, then fill NaNs
with zeros.
In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0) Out[29]: Col Y cat 1 cat 2 cat 3 Col X class 1 1 0 0 class 2 1 0 1 class 3 0 1 0
3) Or, use pd.pivot_table()
with index=Col X
, columns=Col Y
In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0) Out[30]: Col Y cat 1 cat 2 cat 3 Col X class 1 1 0 0 class 2 1 0 1 class 3 0 1 0
4) Or, use set_index
with unstack
In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0) Out[492]: Col Y cat 1 cat 2 cat 3 Col X class 1 1 0 0 class 2 1 0 1 class 3 0 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With