Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get the index of each item in a groupby object in Pandas?

Tags:

python

pandas

I use groupby on a dataframe based on the columns I want and then I have to take the index of each item in its group. By index I mean, if there are 10 items in a group, the index goes from 0 to 9, not the dataframe index.

My code for doing this is below:

import pandas as pd

df = pd.DataFrame({'A': np.random.randint(0, 11, 10 ** 3), 'B': np.random.randint(0, 11, 10 ** 3), 
                   'C': np.random.randint(0, 11, 10 ** 3), 'D': np.random.randint(0, 2, 10 ** 3)})

grouped_by = df.groupby(["A", "B", "C"])
groups = dict(list(grouped_by))
index_dict = {k: v.index.tolist() for k,v in groups.items()}
df["POS"] = df.apply(lambda x: index_dict[(x["A"], x["B"], x["C"])].index(x.name), axis=1)

The dataframe here is just an example.

Is there a way to use the grouped_by to achieve this ?

like image 374
IordanouGiannis Avatar asked Sep 27 '22 01:09

IordanouGiannis


1 Answers

Here's a solution using cumcount() on a dummy variable to generate a item index for each group. It should be significantly faster too.

In [122]: df['dummy'] = 0
     ...: df["POS"] = df.groupby(['A','B','C'])['dummy'].cumcount()
     ...: df = df.drop('dummy', axis=1)

As @unutbu noted, even cleaner just to use:

df["POS"] = df.groupby(['A','B','C']).cumcount()
like image 134
chrisb Avatar answered Sep 30 '22 23:09

chrisb