Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the indices of matching rows between 2 data frames

I have 2 dataframes with a partially matching column:

data_a = ['a', 'b', 'c', 'd', 'e', 'f']
data_b = ['b', 'd', 'f', 'h']
df1 = pd.DataFrame(data_a, columns=['data'])
df2 = pd.DataFrame(data_b, columns=['data'])

I would like to get the indices match between the data in df2 to df1. Something like:

indices = [1, 3, 5, None]

So the 0 index row of df2 matches the 1 index row of df1, the 1 index row of df2 matches the 3 index row of df1, etc...

If no match, then the value is None or any negative value.
It may be assumed there will be a single match (The data columns are unique in their values).

I could do it in a loop, yet it is slow.
I am looking for efficient way to get this or something similar.

like image 574
Avi T Avatar asked Dec 06 '25 07:12

Avi T


2 Answers

Should work as long as values in df1 are unique and hashable --

>>> s1 = df1['data']
>>> s2 = df2['data']

>>> s3 = pd.Series(data=s1.index, index=s1)
>>> s3
data
a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

>>> s2.map(s3)
0    1.0
1    3.0
2    5.0
3    NaN
Name: data, dtype: float64

>>> s2.map(s3).fillna(-1).astype(int).to_list()
[1, 3, 5, -1]
like image 196
Chris Fu Avatar answered Dec 08 '25 21:12

Chris Fu


reset_index, then left-merge:

df2.merge(df1.reset_index(), on='data', how='left')

Output:

  data  index
0    b    1.0
1    d    3.0
2    f    5.0
3    h    NaN

Note that if you have several matches, this will create additional rows:

data_a = ['a', 'b', 'c', 'd', 'b', 'f']
data_b = ['b', 'd', 'f', 'h']
df1 = pd.DataFrame(data_a, columns=['data'])
df2 = pd.DataFrame(data_b, columns=['data'])
out = df2.merge(df1.reset_index(), on='data', how='left')

  data  index
0    b    1.0
1    b    4.0
2    d    3.0
3    f    5.0
4    h    NaN
like image 26
mozway Avatar answered Dec 08 '25 21:12

mozway