Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between "as_index = False", and "reset_index()" in pandas groupby

I just wanted to know what is the difference in the function performed by these 2.

Data:

import pandas as pd
df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]})

as_index=False :

df_group1 = df.groupby("ID").sum().reset_index()

reset_index() :

df_group2 = df.groupby("ID", as_index=False).sum()

Both of them give the exact same output.

  ID  value
0  A     18
1  B      6
2  C      6

Can anyone tell me what is the difference and any example illustrating the same?

like image 691
Rohith Avatar asked Aug 15 '18 21:08

Rohith


People also ask

What is As_index false in pandas?

as_index=False is effectively “SQL-style” grouped output. Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group.

What is reset_index () in pandas?

Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.

What is reset_index inplace true?

When you set inplace = True , the reset_index method will not create a new DataFrame. Instead, it will directly modify and overwrite your original DataFrame.

How do I get rid of index after Groupby pandas?

In order to reset the index after groupby() we will use the reset_index() function.


1 Answers

When you use as_index=False, you indicate to groupby() that you don't want to set the column ID as the index (duh!). When both implementation yield the same results, use as_index=False because it will save you some typing and an unnecessary pandas operation ;)

However, sometimes, you want to apply more complicated operations on your groups. In those occasions, you might find out that one is more suited than the other.

Example 1: You want to sum the values of three variables (i.e. columns) in a group on both axes.

Using as_index=True allows you to apply a sum over axis=1 without specifying the names of the columns, then summing the value over axis 0. When the operation is finished, you can use reset_index(drop=True/False) to get the dataframe under the right form.

Example 2: You need to set a value for the group based on the columns in the groupby().

Setting as_index=False allow you to check the condition on a common column and not on an index, which is often way easier.

At some point, you might come across KeyError when applying operations on groups. In that case, it is often because you are trying to use a column in your aggregate function that is currently an index of your GroupBy object.

like image 78
qmeeus Avatar answered Sep 28 '22 02:09

qmeeus