Is There Complete Overlap Between `pd.pivot_table` and `pd.DataFrame.groupby` + `pd.DataFrame.unstack`?

Tags:

(Please note that there's a question Pandas: group by and Pivot table difference, but this question is different.)

Suppose you start with a DataFrame

df = pd.DataFrame({'a': ['x'] * 2 + ['y'] * 2, 'b': [0, 1, 0, 1], 'val': range(4)})
>>> df
Out[18]: 
   a  b  val
0  x  0    0
1  x  1    1
2  y  0    2
3  y  1    3

Now suppose you want to make the index a, the columns b, the values in a cell val, and specify what to do if there are two or more values in a resulting cell:

Then you can do this either through

df.val.groupby([df.a, df.b]).sum().unstack()

or through

pd.pivot_table(df, index='a', columns='b', values='val', aggfunc='sum')

So it seems to me that there's a simple correspondence between correspondence between the two (given one, you could almost write a script to transform it into the other). I also thought of more complex cases with hierarchical indices / columns, but I still see no difference.

Is there something I've missed?

Are there operations that can be performed using one and not the other?
Are there, perhaps, operations easier to perform using one over the other?
If not, why not deprecate pivot_tale? groupby seems much more general.

327

asked Sep 24 '16 09:09

Ami Tavory

1 Answers

If I understood the source code for pivot_table(index, columns, values, aggfunc) correctly it's tuned up equivalent for:

df.groupby([index + columns]).agg(aggfunc).unstack(columns)

plus:

margins (subtotals and grand totals as @ayhan has already said)
pivot_table() also removes extra multi-levels from columns axis (see example below)
convenient dropna parameter: Do not include columns whose entries are all NaN

Demo: (I took this DF from the docstring [source code for pivot_table()])

In [40]: df
Out[40]:
     A    B      C  D
0  foo  one  small  1
1  foo  one  large  2
2  foo  one  large  2
3  foo  two  small  3
4  foo  two  small  3
5  bar  one  large  4
6  bar  one  small  5
7  bar  two  small  6
8  bar  two  large  7

In [41]: df.pivot_table(index=['A','B'], columns='C', values='D', aggfunc=[np.sum,np.mean])
Out[41]:
          sum        mean
C       large small large small
A   B
bar one   4.0   5.0   4.0   5.0
    two   7.0   6.0   7.0   6.0
foo one   4.0   1.0   2.0   1.0
    two   NaN   6.0   NaN   3.0

pay attention at the top level column: D

In [42]: df.groupby(['A','B','C']).agg([np.sum, np.mean]).unstack('C')
Out[42]:
            D
          sum        mean
C       large small large small
A   B
bar one   4.0   5.0   4.0   5.0
    two   7.0   6.0   7.0   6.0
foo one   4.0   1.0   2.0   1.0
    two   NaN   6.0   NaN   3.0

why not deprecate pivot_tale? groupby seems much more general.

IMO, because it's very easy to use and very convenient! ;)

answered Sep 28 '22 13:09

MaxU - stop WAR against UA

Related questions
                            
                                Is python smart enough to replace function calls with constant result?
                            
                                How to use eventlet library for async gunicorn workers
                            
                                Python+kivy+SQLite: How to use them together
                            
                                How to predict new values using statsmodels.formula.api (python)
                            
                                How to load table from SQLLite db file from PySpark?
                            
                                Pandas: Type conversion using `df.loc` from datetime64 to int
                            
                                python :: iterate through nested JSON results
                            
                                Best way to set Entry Background Color in Python GTK3 and set back to default
                            
                                Theano: Where to put .theanorc file for Anaconda installation? (Windows)
                            
                                Pig: is it possible to use pytz or dateutils for Python udfs?
                            
                                pytest-bdd reuse same method for different steps
                            
                                Debugging python-behave steps with Pycharm
                            
                                python: trouble with Popen FileNotFoundError: [WinError 2]
                            
                                Storing pure python datetime.datetime in pandas DataFrame
                            
                                Is there a way to prevent pandas to_json from adding \?
                            
                                logging - rewrite if size gets too big?
                            
                                Python Tkinter in Docker .TclError: couldn't connect to display
                            
                                Using Selenium on calendar date picker
                            
                                Get Flask to serve index.html in a directory
                            
                                Double loop takes time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is There Complete Overlap Between `pd.pivot_table` and `pd.DataFrame.groupby` + `pd.DataFrame.unstack`?

Tags:

python

pandas

group-by

pivot-table

Ami Tavory

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us