Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the dataframe column values based on unique combination of other columns

Tags:

python

pandas

I have the following data frame:

 df=pd.DataFrame([[1,11,'a'],[2,12,'b'],[1,11,'c'],[3,12,'d'],[3,7,'e'],
    [2,12,'f']])
 df.columns=['id','code','name']

 print(df)


     id  code name
  0   1    11    a
  1   2    12    b
  2   1    11    c
  3   3    12    d
  4   3     7    e
  5   2    12    f

For the above dataframe, I want to have only one value of column 'name' for any unique combination of column id and code. For eq, the name for rows 0 and 2 should be same. Also, the name for rows 1 and 5 should also be same.

       id  code name
   0   1    11    a
   1   2    12    b
   2   1    11    a
   3   3    12    d
   4   3     7    e
   5   2    12    b

Please let me know how this can be done programmatically. I have two undergo this operation on more than 100000 rows.

Thanks

like image 223
Venkatesh Malhotra Avatar asked Dec 24 '22 15:12

Venkatesh Malhotra


2 Answers

Let's use groupby, transform, and first:

df.assign(name=df.groupby(['id','code'])['name'].transform('first'))

Output:

   id  code name
0   1    11    a
1   2    12    b
2   1    11    a
3   3    12    d
4   3     7    e
5   2    12    b
like image 196
Scott Boston Avatar answered May 22 '23 03:05

Scott Boston


Or you do not need groupby

A=df.sort_values(['id','code','name']).drop_duplicates(['id','code'],keep='first').index
df.loc[~df.index.isin(A),'name']=np.nan
df.sort_values(['id','code','name']).ffill().sort_index()


Out[603]: 
   id  code name
0   1    11    a
1   2    12    b
2   1    11    a
3   3    12    d
4   3     7    e
5   2    12    b
like image 24
BENY Avatar answered May 22 '23 04:05

BENY