Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is as_index in groupby in pandas?

Tags:

python

pandas

What exactly is the function of as_index in groupby in Pandas?

like image 891
Haritha Avatar asked Dec 20 '16 06:12

Haritha


People also ask

What is as_index in Groupby pandas?

When as_index=True the key(s) you use in groupby() will become an index in the new dataframe. The benefits you get when you set the column as index are: Speed. When you filter values based on the index column eg. df.

What does as_index false mean in Python?

When you use as_index=False , you indicate to groupby() that you don't want to set the column ID as the index (duh!).

What does DF Groupby ([ genre ]) do?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

What is AGG in Groupby?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.


2 Answers

print() is your friend when you don't understand a thing. It clears out doubts many times.

Take a look:

import pandas as pd  df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})  print(df)  print(df.groupby('books', as_index=True).sum())  print(df.groupby('books', as_index=False).sum()) 

Output:

  books  price 0   bk1     12 1   bk1     12 2   bk1     12 3   bk2     15 4   bk2     15 5   bk3     17         price books        bk1       36 bk2       30 bk3       17    books  price 0   bk1     36 1   bk2     30 2   bk3     17 

When as_index=True the key(s) you use in groupby() will become an index in the new dataframe.

The benefits you get when you set the column as index are:

  1. Speed. When you filter values based on the index column eg. df.loc['bk1'], it would be faster because of hashing of index column. It doesn't have to traverse the entire books column to find 'bk1'. It will just calculate the hash value of 'bk1' and find it in 1 go.

  2. Ease. When as_index=True you can use this syntax df.loc['bk1'] which is shorter and faster as opposed to df.loc[df.books=='bk1'] which is longer and slower.

like image 81
Mohammad Yusuf Avatar answered Sep 22 '22 02:09

Mohammad Yusuf


When using the group by function, as_index can be set to true or false depending on if you want the column by which you grouped to be the index of the output.

import pandas as pd table_r = pd.DataFrame({     'colors': ['orange', 'red', 'orange', 'red'],     'price': [1000, 2000, 3000, 4000],     'quantity': [500, 3000, 3000, 4000], }) new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False) print new_group 

output:

        price  quantity colors                  orange      2         2 red         2         2 

Now with as_index=False

   colors  price  quantity 0  orange      2         2 1     red      2         2 

Note how colors is no longer an index when we change as_index=False

like image 34
Marc vT Avatar answered Sep 20 '22 02:09

Marc vT