Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get back the index after groupby in pandas

Tags:

python

pandas

I am trying to find the the record with maximum value from the first record in each group after groupby and delete the same from the original dataframe.

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 
                   'cost': [1, 2, 1, 1, 3, 1, 5]})
print df 
t = df.groupby('item_id').first() #lost track of the index
desired_row = t[t.cost == t.cost.max()]
#delete this row from df

         cost
item_id      
d           5

I need to keep track of desired_row and delete this row from df and repeat the process.

What is the best way to find and delete the desired_row?

like image 210
learner Avatar asked Aug 22 '17 23:08

learner


People also ask

How do you get index after Groupby pandas?

To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().

Does pandas Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

How do I retrieve a DataFrame index?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

How do I get rid of Groupby in pandas DataFrame?

you can set as_index as False to remove the index from the grouped by df.


2 Answers

I am not sure of a general way, but this will work in your case since you are taking the first item of each group (it would also easily work on the last). In fact, because of the general nature of split-aggregate-combine, I don't think this is easily achievable without doing it yourself.

gb = df.groupby('item_id', as_index=False)
>>> gb.groups  # Index locations of each group.
{'a': [0, 1], 'b': [2, 3, 4], 'c': [5], 'd': [6]}

# Get the first index location from each group using a dictionary comprehension.
subset = {k: v[0] for k, v in gb.groups.iteritems()}
df2 = df.iloc[subset.values()]
# These are the first items in each groupby.
>>> df2
   cost item_id
0     1       a
5     1       c
2     1       b
6     5       d

# Exclude any items from above where the cost is equal to the max cost across the first item in each group.
>>> df[~df.index.isin(df2[df2.cost == df2.cost.max()].index)]
   cost item_id
0     1       a
1     2       a
2     1       b
3     1       b
4     3       b
5     1       c
like image 102
Alexander Avatar answered Oct 17 '22 14:10

Alexander


Try this ?

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'],
                   'cost': [1, 2, 1, 1, 3, 1, 5]})
t=df.drop_duplicates(subset=['item_id'],keep='first')
desired_row = t[t.cost == t.cost.max()]
df[~df.index.isin([desired_row.index[0]])]

Out[186]: 
   cost item_id
0     1       a
1     2       a
2     1       b
3     1       b
4     3       b
5     1       c
like image 40
BENY Avatar answered Oct 17 '22 14:10

BENY