I just started learning Pandas and was wondering if there is any difference between <code>groupby()</code> and <code>pivot_table()</code> functions. Can anyone help me understand the difference between them. Help would be appreciated.

It's more appropriate to use <code>.pivot_table()</code> instead of <code>.groupby()</code> when you need to show aggregates with both rows and column labels. <code>.pivot_table()</code> makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using <code>.groupby()</code> with few extra steps.

Pandas: group by and Pivot table difference

Tags:

python

pandas

I just started learning Pandas and was wondering if there is any difference between groupby() and pivot_table() functions. Can anyone help me understand the difference between them. Help would be appreciated.

369

asked Jan 10 '16 06:01

user4943236

2 Answers

Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum) a table is created where a is on the row axis, b is on the column axis, and the values are the sum of c.

Example:

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)}) pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)  b         1         2 a                     1  0.528470  0.484766 2  0.187277  0.144326 3  0.866832  0.650100

Using groupby, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.

In this example, we create a series of the sum of values c, grouped by all unique combinations of a and b.

df.groupby(['a','b'])['c'].sum()  a  b 1  1    0.528470    2    0.484766 2  1    0.187277    2    0.144326 3  1    0.866832    2    0.650100 Name: c, dtype: float64

A similar usage of groupby is if we omit the ['c']. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a and b.

print df.groupby(["a","b"]).sum()             c a b           1 1  0.528470   2  0.484766 2 1  0.187277   2  0.144326 3 1  0.866832   2  0.650100

200

answered Oct 09 '22 13:10

David Maust

It's more appropriate to use .pivot_table() instead of .groupby() when you need to show aggregates with both rows and column labels.

.pivot_table() makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby() with few extra steps.

answered Oct 09 '22 11:10

kyramichel

Related questions
                            
                                How can I convert radians to degrees with Python?
                            
                                How can I denote unused function arguments?
                            
                                inverting image in Python with OpenCV
                            
                                Debugging the error "gcc: error: x86_64-linux-gnu-gcc: No such file or directory"
                            
                                Find Monday's date with Python
                            
                                SSL: CERTIFICATE_VERIFY_FAILED with Python3
                            
                                Python urllib2, basic HTTP authentication, and tr.im
                            
                                Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
                            
                                Intersecting two dictionaries
                            
                                Memory error when using pandas read_csv
                            
                                When and how to use Tornado? When is it useless?
                            
                                matplotlib: can I create AxesSubplot objects, then add them to a Figure instance?
                            
                                Python remove set from set
                            
                                Pandas timeseries plot setting x-axis major and minor ticks and labels
                            
                                how to convert 2d list to 2d numpy array?
                            
                                Mocking Functions Using Python Mock
                            
                                Is 'file' a keyword in python?
                            
                                regexes: How to access multiple matches of a group? [duplicate]
                            
                                Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape
                            
                                Python "private" function coding convention

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With