Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace with first occurrence value for duplicate columns using pandas or python

Tags:

python

pandas

I have data like

ca ca ca 120.00

ca cc cd 130.00

ca ca ca 135.23

ca ha ca 60.00

ca ha ca 50.00

If first 3 columns are equal then fourth column value should be the first occurrence. I want data like

ca ca ca 120.00

ca cc cd 130.00

ca ca ca 120.00

ca ha ca 60.00

ca ha ca 60.00

Please help me to solve this

like image 440
Sathish Kumar Avatar asked May 11 '26 15:05

Sathish Kumar


1 Answers

Use GroupBy.transform with GroupBy.first

Dynamic solution with selecting first 3 columns to list and processing 4th column assigned back:

df.iloc[:, 3] = df.groupby(df.columns[:3].tolist())[df.columns[3]].transform('first')
print (df)
    0   1   2      3
0  ca  ca  ca  120.0
1  ca  cc  cd  130.0
2  ca  ca  ca  120.0
3  ca  ha  ca   60.0
4  ca  ha  ca   60.0

If there are 4 columns names like a,b,c,d solution is simplier:

df['d'] = df.groupby(['a','b','c'])['d'].transform('first')
print (df)
    a   b   c      d
0  ca  ca  ca  120.0
1  ca  cc  cd  130.0
2  ca  ca  ca  120.0
3  ca  ha  ca   60.0
4  ca  ha  ca   60.0
like image 139
jezrael Avatar answered May 14 '26 05:05

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!