How to get back the index after groupby in pandas

Tags:

pandas

I am trying to find the the record with maximum value from the first record in each group after groupby and delete the same from the original dataframe.

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 
                   'cost': [1, 2, 1, 1, 3, 1, 5]})
print df 
t = df.groupby('item_id').first() #lost track of the index
desired_row = t[t.cost == t.cost.max()]
#delete this row from df

         cost
item_id      
d           5

I need to keep track of desired_row and delete this row from df and repeat the process.

What is the best way to find and delete the desired_row?

210

asked Aug 22 '17 23:08

2 Answers

I am not sure of a general way, but this will work in your case since you are taking the first item of each group (it would also easily work on the last). In fact, because of the general nature of split-aggregate-combine, I don't think this is easily achievable without doing it yourself.

gb = df.groupby('item_id', as_index=False)
>>> gb.groups  # Index locations of each group.
{'a': [0, 1], 'b': [2, 3, 4], 'c': [5], 'd': [6]}

# Get the first index location from each group using a dictionary comprehension.
subset = {k: v[0] for k, v in gb.groups.iteritems()}
df2 = df.iloc[subset.values()]
# These are the first items in each groupby.
>>> df2
   cost item_id
0     1       a
5     1       c
2     1       b
6     5       d

# Exclude any items from above where the cost is equal to the max cost across the first item in each group.
>>> df[~df.index.isin(df2[df2.cost == df2.cost.max()].index)]
   cost item_id
0     1       a
1     2       a
2     1       b
3     1       b
4     3       b
5     1       c

102

answered Oct 17 '22 14:10

Alexander

Try this ?

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'],
                   'cost': [1, 2, 1, 1, 3, 1, 5]})
t=df.drop_duplicates(subset=['item_id'],keep='first')
desired_row = t[t.cost == t.cost.max()]
df[~df.index.isin([desired_row.index[0]])]

Out[186]: 
   cost item_id
0     1       a
1     2       a
2     1       b
3     1       b
4     3       b
5     1       c

answered Oct 17 '22 14:10

BENY

Related questions
                            
                                Printing return value in function
                            
                                post request with \n-delimited JSON in python
                            
                                How can I check if a string contains a number between two brackets and return the location?
                            
                                Randomly selecting a different pair of items from a list
                            
                                How to reliably separate decimal and floating parts from a number?
                            
                                eval fails in list comprehension [duplicate]
                            
                                Obtain tf-idf weights of words with sklearn
                            
                                Django Error ---index() missing 1 required positional argument: 'pk'
                            
                                Pythonic way to initialize an object with a lot of parameters and default value [duplicate]
                            
                                Efficiently initialize 2D array of size n*m in Python 3?
                            
                                Chaining Iterators To Flat Iterator
                            
                                Applying the python-geohash encode function on a dataframe
                            
                                Plotting graph using matplotlib in Jupyter iPython Notebook
                            
                                How I can apply groupby two times on pandas data frame?
                            
                                Python-pptx - Text parameters (font, size, position) on Autoshape
                            
                                Get model details from H2O model object
                            
                                Convert Base 64 String to BytesIO
                            
                                Formatting dict keys: AttributeError: 'dict' object has no attribute 'keys()'
                            
                                Are sympy matrices really that slow?
                            
                                if a == b or a == c: vs if a in {b, c}:

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get back the index after groupby in pandas

Tags:

python

pandas

learner

People also ask

2 Answers

Alexander

BENY

Recent Activity

Donate For Us