Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - match two columns from two data frames and create new column in df1

I have two data frames

df1

Srlno id  image
1      3    image1.jpg
2      3    image2.jpg
3      3    image2.jpg

df2

Srlno  id   image
1       1   image1.jpg
2       2   image2.jpg
3       3   image3.jpg

I want to match both the data frames based on the column Image and return the Id from df2 to df1 as a newcolumn. The image names in df2 are unique whereas the image names in df1 has lot of duplicates. I want to retain the duplicate image names but fill in the correct id for each image from df2.

The expected output is :

Srlno id  image          newids
1      3    image1.jpg     1
2      3    image2.jpg     2
3      3    image2.jpg     2

I tried with

df1['newids'] = df1['image'].map(df2.set_index('image')['id'])

This returns an error InvalidInvexError('Reindexing only valid with uniquely valued index objects') I understand the duplicates in df1 is creating this error...but don't know how to resolve.

like image 632
Apricot Avatar asked Jan 02 '23 06:01

Apricot


1 Answers

Another solution with dict(zip())

df1['newids']=df1.image.map(dict(zip(df2.image,df2.id)))
print(df1)

   Srlno  id       image  newids
0      1   3  image1.jpg       1
1      2   3  image2.jpg       2
2      3   3  image2.jpg       2
like image 87
anky Avatar answered Jan 04 '23 22:01

anky