Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace rows in each groups, by the first row value. Pandas Groupby

Here is a dataframe:

df = pd.DataFrame({'A' : ['foo', 'foo', 'bar', 'bar', 'bar'],
                   'B' : ['1', '2','2', '4', '1']})

Below is how I want it to look,

enter image description here

And here is how I have tried and failed.

groups = df.groupby([A])
groups.apply(lambda g: g[g[B] == g[B].first()]).reset_index(drop=True)
like image 541
A.Z Avatar asked Oct 29 '25 08:10

A.Z


2 Answers

You can do:

df['B'] = df.groupby('A')['B'].transform('first')

or, if data already sorted by A as showned:

df['B'] = df['B'].mask(df['A'].duplicated()).ffill()

Output:

     A  B
0  foo  1
1  foo  1
2  bar  2
3  bar  2
4  bar  2
like image 86
Quang Hoang Avatar answered Oct 31 '25 05:10

Quang Hoang


Use drop_duplicates + repeat

s=df.drop_duplicates('A')
s=s.reindex(s.index.repeat(df.A.value_counts()))
Out[555]: 
     A  B
0  foo  1
0  foo  1
0  foo  1
2  bar  2
2  bar  2
like image 40
BENY Avatar answered Oct 31 '25 04:10

BENY