How to keep original index of a DataFrame after groupby 2 columns?

Tags:

Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index value is the only 'unique' column to perform the merge back into. Does anyone know how I can achieve this?

My DataFrame is quite large. My groupby looks like this:

df.groupby(['col1', 'col2']).agg({'col3': 'count'}).reset_index()

This drops my original indexes from my original dataframe, which I want to keep.

656

asked Mar 11 '18 03:03

Hana

2 Answers

You can elevate your index to a column via reset_index. Then aggregate your index to a tuple via agg, together with your count aggregation.

Below is a minimal example.

import pandas as pd, numpy as np

df = pd.DataFrame(np.random.randint(0, 4, (50, 5)),
                  index=np.random.randint(0, 4, 50))

df = df.reset_index()

res = df.groupby([0, 1]).agg({2: 'count', 'index': lambda x: tuple(x)}).reset_index()

#     0  1  2            index
# 0   0  0  4     (2, 0, 0, 2)
# 1   0  1  4     (0, 3, 1, 1)
# 2   0  2  1             (1,)
# 3   0  3  1             (3,)
# 4   1  0  4     (1, 2, 1, 3)
# 5   1  1  2           (1, 3)
# 6   1  2  4     (2, 1, 2, 2)
# 7   1  3  1             (2,)
# 8   2  0  5  (0, 3, 0, 2, 2)
# 9   2  1  2           (0, 2)
# 10  2  2  5  (1, 1, 3, 3, 2)
# 11  2  3  2           (0, 1)
# 12  3  0  4     (0, 3, 3, 3)
# 13  3  1  4     (1, 3, 0, 1)
# 14  3  2  3        (3, 2, 1)
# 15  3  3  4     (3, 3, 2, 1)

160

answered Sep 20 '22 15:09

jpp

I think you are are looking for transform in this situation:

df['count'] = df.groupby(['col1', 'col2'])['col3'].transform('count')

answered Sep 17 '22 15:09

Scott Boston

Related questions
                            
                                How to decide the size of layers in Keras' Dense method?
                            
                                BeautifulSoup extract top-level tags only [duplicate]
                            
                                hackerrank new year chaos code optimization
                            
                                What does sys.exit really do with multiple threads?
                            
                                time complexity of random access in deque in Python [duplicate]
                            
                                Using pip on Windows installed with both python 2.7 and 3.5
                            
                                Python kernel dies for second run of PyQt5 GUI
                            
                                Use Numpy to convert rgb pixel array into grayscale [duplicate]
                            
                                Single instance of class in Python
                            
                                How to normalize a 4D numpy array?
                            
                                Color scatter plot points based on a value in third column?
                            
                                Pandas: what is a NDFrame object (and what is a non-NDFrame object)
                            
                                How to I factorize a list of tuples?
                            
                                Python dynamic import - how to import * from module name from variable?
                            
                                Python: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future
                            
                                What does tf.global_variables_initializer() do under the hood?
                            
                                Convert Numpy array to Pandas DataFrame column-wise (As Single Row)
                            
                                Python: Handling newlines in json.load() vs json.loads()
                            
                                How to mock a function, that is imported within an imported method from different module
                            
                                Python requests with HTTPAdapter is halting for hours

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to keep original index of a DataFrame after groupby 2 columns?

Tags:

python

indexing

pandas

dataframe

pandas-groupby

Hana

People also ask

2 Answers

jpp

Scott Boston

Recent Activity

Donate For Us