I have the following data frame:
df=pd.DataFrame([[1,11,'a'],[2,12,'b'],[1,11,'c'],[3,12,'d'],[3,7,'e'],
[2,12,'f']])
df.columns=['id','code','name']
print(df)
id code name
0 1 11 a
1 2 12 b
2 1 11 c
3 3 12 d
4 3 7 e
5 2 12 f
For the above dataframe, I want to have only one value of column 'name' for any unique combination of column id
and code
. For eq, the name
for rows 0 and 2 should be same. Also, the name
for rows 1 and 5 should also be same.
id code name
0 1 11 a
1 2 12 b
2 1 11 a
3 3 12 d
4 3 7 e
5 2 12 b
Please let me know how this can be done programmatically. I have two undergo this operation on more than 100000 rows.
Thanks
Let's use groupby
, transform
, and first
:
df.assign(name=df.groupby(['id','code'])['name'].transform('first'))
Output:
id code name
0 1 11 a
1 2 12 b
2 1 11 a
3 3 12 d
4 3 7 e
5 2 12 b
Or you do not need groupby
A=df.sort_values(['id','code','name']).drop_duplicates(['id','code'],keep='first').index
df.loc[~df.index.isin(A),'name']=np.nan
df.sort_values(['id','code','name']).ffill().sort_index()
Out[603]:
id code name
0 1 11 a
1 2 12 b
2 1 11 a
3 3 12 d
4 3 7 e
5 2 12 b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With