finding nearly identical rows between data frames

Question

I have the following two data frames:

df1 = data_frame(x = c(1128.4, 1101.2), y = c(124.5, 325.2)

df2 = data_frame(x = c(1128.7, 1100.5, 1527.8, 1347.5), y = c(83.2, 124.2, 
370.3, 325.5))

I would like to find rows in df1 that are nearly identical (1% difference in either direction) to rows in df2, in a very efficient way to be able to do this for hundreds of rows in df1 versus a much larger df2.

The expected output would be, e.g. a list:

L$x = c(1,2)
L$y = c(2,4)

to indicate the similar rows in each column.

If the assignment was to find completely identical rows, I would use left_join, of course, where the smaller df1 would be on the left.

Is there an efficient way to do this? (In general I would like to achieve this for multiple columns, as well).

denis · Accepted Answer

with data.table, you will have the fastest solution:

library(data.table)
df1 = data.table(x = c(1128.4, 1101.2))
df2 = data.table(x = c(1128.7, 1100.5, 1527.8, 1347.5))

setkey(df2,x)
df2[,y := x]
df2[J(df1$x),roll = "nearest"][abs(x-y)/y < 0.01]

        x      y
1: 1128.4 1128.7
2: 1101.2 1100.5

finding nearly identical rows between data frames

Tags:

dataframe

r

vectorization

Omry Atia

1 Answers

denis

Recent Activity

Donate For Us

finding nearly identical rows between data frames

Tags:

dataframe

r

vectorization

Omry Atia

1 Answers

denis

Related questions

Recent Activity

Donate For Us