What exactly is the function of <code>as_index</code> in <code>groupby</code> in Pandas?

<code>print()</code> is your friend when you don't understand a thing. It clears out doubts many times. Take a look: <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]}) print(df) print(df.groupby('books', as_index=True).sum()) print(df.groupby('books', as_index=False).sum()) </code></pre> Output: <pre class="prettyprint"><code> books price 0 bk1 12 1 bk1 12 2 bk1 12 3 bk2 15 4 bk2 15 5 bk3 17 price books bk1 36 bk2 30 bk3 17 books price 0 bk1 36 1 bk2 30 2 bk3 17 </code></pre> When <code>as_index=True</code> the key(s) you use in <code>groupby()</code> will become an index in the new dataframe. The benefits you get when you set the column as index are: <ol> <li>Speed. When you filter values based on the index column eg. <code>df.loc['bk1']</code>, it would be faster because of hashing of index column. It doesn't have to traverse the entire <code>books</code> column to find <code>'bk1'</code>. It will just calculate the hash value of <code>'bk1'</code> and find it in 1 go.</li> <li>Ease. When <code>as_index=True</code> you can use this syntax <code>df.loc['bk1']</code> which is shorter and faster as opposed to <code>df.loc[df.books=='bk1']</code> which is longer and slower.</li> </ol>

What is as_index in groupby in pandas?

2 Answers

print() is your friend when you don't understand a thing. It clears out doubts many times.

Take a look:

import pandas as pd  df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]})  print(df)  print(df.groupby('books', as_index=True).sum())  print(df.groupby('books', as_index=False).sum())

Output:

  books  price 0   bk1     12 1   bk1     12 2   bk1     12 3   bk2     15 4   bk2     15 5   bk3     17         price books        bk1       36 bk2       30 bk3       17    books  price 0   bk1     36 1   bk2     30 2   bk3     17

When as_index=True the key(s) you use in groupby() will become an index in the new dataframe.

The benefits you get when you set the column as index are:

Speed. When you filter values based on the index column eg. df.loc['bk1'], it would be faster because of hashing of index column. It doesn't have to traverse the entire books column to find 'bk1'. It will just calculate the hash value of 'bk1' and find it in 1 go.
Ease. When as_index=True you can use this syntax df.loc['bk1'] which is shorter and faster as opposed to df.loc[df.books=='bk1'] which is longer and slower.

answered Sep 22 '22 02:09

Mohammad Yusuf

When using the group by function, as_index can be set to true or false depending on if you want the column by which you grouped to be the index of the output.

import pandas as pd table_r = pd.DataFrame({     'colors': ['orange', 'red', 'orange', 'red'],     'price': [1000, 2000, 3000, 4000],     'quantity': [500, 3000, 3000, 4000], }) new_group = table_r.groupby('colors',as_index=True).count().sort('price', ascending=False) print new_group

output:

        price  quantity colors                  orange      2         2 red         2         2

Now with as_index=False

   colors  price  quantity 0  orange      2         2 1     red      2         2

Note how colors is no longer an index when we change as_index=False

answered Sep 20 '22 02:09

Marc vT

Related questions
                            
                                matplotlib savefig() size control
                            
                                How to install sklearn? [closed]
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                How to remove decimal points in pandas
                            
                                Python - How NOT to sort Sphinx output in alphabetical order
                            
                                How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?
                            
                                How can I use f-string with a variable, not with a string literal?
                            
                                Short Python alphanumeric hash with minimal collisions
                            
                                How to reset cursor to the beginning of the same line in Python
                            
                                Is there a Python module to open SPSS files?
                            
                                Getting Errno 9: Bad file descriptor in python socket
                            
                                python pandas not reading first column from csv file
                            
                                C# equivalent of rotating a list using python slice operation
                            
                                How do I loop through **kwargs in Python?
                            
                                Python requests - Exception Type: ConnectionError - try: except does not work
                            
                                Difference between super() and calling superclass directly
                            
                                Meaning of X = X[:, 1] in Python
                            
                                Cannot resolve 'django.utils.log.NullHandler' in Django 1.9+
                            
                                Add alpha to an existing matplotlib colormap
                            
                                Random Sample of a subset of a dataframe in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is as_index in groupby in pandas?

Tags:

python

pandas

Haritha

People also ask

2 Answers

Mohammad Yusuf

Marc vT

Recent Activity

Donate For Us