Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: group by and Pivot table difference

Tags:

python

pandas

I just started learning Pandas and was wondering if there is any difference between groupby() and pivot_table() functions. Can anyone help me understand the difference between them. Help would be appreciated.

like image 369
user4943236 Avatar asked Jan 10 '16 06:01

user4943236


People also ask

What is the difference between pivot and groupby?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

What is the difference between pivot and pivot table in pandas?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here's an example.

Is a pivot table a group by?

Group by date and time With time grouping, relationships across time-related fields are automatically detected and grouped together when you add rows of time fields to your PivotTables. Once grouped together, you can drag the group to your Pivot Table and start your analysis.

What is the use of pivot table in pandas?

Pandas pivot tables work in a very similar way to those found in spreadsheet tools such as Excel. The pivot table function takes in a data frame, some parameters detailing the shape you want the data to take and the outputs is summarised data in the form of a pivot table.


2 Answers

Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum) a table is created where a is on the row axis, b is on the column axis, and the values are the sum of c.

Example:

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)}) pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)  b         1         2 a                     1  0.528470  0.484766 2  0.187277  0.144326 3  0.866832  0.650100 

Using groupby, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.

In this example, we create a series of the sum of values c, grouped by all unique combinations of a and b.

df.groupby(['a','b'])['c'].sum()  a  b 1  1    0.528470    2    0.484766 2  1    0.187277    2    0.144326 3  1    0.866832    2    0.650100 Name: c, dtype: float64 

A similar usage of groupby is if we omit the ['c']. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a and b.

print df.groupby(["a","b"]).sum()             c a b           1 1  0.528470   2  0.484766 2 1  0.187277   2  0.144326 3 1  0.866832   2  0.650100 
like image 200
David Maust Avatar answered Oct 09 '22 13:10

David Maust


It's more appropriate to use .pivot_table() instead of .groupby() when you need to show aggregates with both rows and column labels.

.pivot_table() makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby() with few extra steps.

like image 42
kyramichel Avatar answered Oct 09 '22 11:10

kyramichel