Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivot Tables or Group By for Pandas?

I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. It should be easy.

Here's the challenge.

I have a pandas dataframe:

+--------------------------+ |     Col 'X'    Col 'Y'  | +--------------------------+ |     class 1      cat 1  | |     class 2      cat 1  | |     class 3      cat 2  | |     class 2      cat 3  | +--------------------------+ 

What I am looking to transform the dataframe into:

+------------------------------------------+ |                  cat 1    cat 2    cat 3 | +------------------------------------------+ |     class 1         1        0        0  | |     class 2         1        0        1  | |     class 3         0        1        0  | +------------------------------------------+ 

Where the values are value counts. Anybody have any insight? Thanks!

like image 518
SteelyDanish Avatar asked Jun 06 '15 05:06

SteelyDanish


People also ask

What is the difference between pivot table and Groupby in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

Is group by or pivot same?

Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

What is the difference between pivot and pivot table in pandas?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here's an example.

Why do we use pivot table in pandas?

Pandas pivot tables can be used in conjunction with the Pandas plotting functionality to create useful data visualisations. Simply adding . plot() to the end of your pivot table code will create a plot of the data.


1 Answers

Here are couple of ways to reshape your data df

In [27]: df Out[27]:      Col X  Col Y 0  class 1  cat 1 1  class 2  cat 1 2  class 3  cat 2 3  class 2  cat 3 

1) Using pd.crosstab()

In [28]: pd.crosstab(df['Col X'], df['Col Y']) Out[28]: Col Y    cat 1  cat 2  cat 3 Col X class 1      1      0      0 class 2      1      0      1 class 3      0      1      0 

2) Or, use groupby on 'Col X','Col Y' with unstack over Col Y, then fill NaNs with zeros.

In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0) Out[29]: Col Y    cat 1  cat 2  cat 3 Col X class 1      1      0      0 class 2      1      0      1 class 3      0      1      0 

3) Or, use pd.pivot_table() with index=Col X, columns=Col Y

In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0) Out[30]: Col Y    cat 1  cat 2  cat 3 Col X class 1      1      0      0 class 2      1      0      1 class 3      0      1      0 

4) Or, use set_index with unstack

In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0) Out[492]: Col Y    cat 1  cat 2  cat 3 Col X class 1      1      0      0 class 2      1      0      1 class 3      0      1      0 
like image 175
Zero Avatar answered Sep 18 '22 17:09

Zero