I have two data frames with mostly the same values
df1:
v1 v2 v3 v4 v5 v6 v7 ......
500 40 5.2 z1 .....
500 40 7.2 z2 .....
500 40 9.0 z3 .....
500 40 3.5 z4 .....
500 40 4.2 z5 .....
df2:
v1 v2 v3 v4 v5 v6 v7 .....
500 40 5.1 m1 .....
500 40 7.9 m2 .....
500 20 8.6 m3 .....
500 40 3.7 m4 .....
500 40 4.0 m5 .....
I would like to merge (or any function like it) so that my new df1 file has exact matching v1 and v2, but v3 does not need to be strictly exact. Is there a way I can match v3 to within +/- 0.2?
I would like the final df1 to look like:
v1 v2 v3 v4 v5 v6 v7 .....
500 40 5.2 z1 .....
500 40 3.5 z4 .....
500 40 4.2 z5 .....
I get as far as below, but I'm not sure how to account for the variability of column v3.
hed <- c("v1", "v2", "v3") #original data didn't have header
df1_final <- merge(df1, df2[hed],by=hed)
If there is a better language to deal with this I'd accept that too, but this is only one part to an entire R script i'm working on.
Using the tidyverse
we could first join
, then filter
with near
(and a tolerance):
library(tidyverse)
df1 <- data_frame(v1 = c(500, 500, 500, 500, 500),
v2 = c(40, 40, 40, 40, 40),
v3 = c(5.2, 7.2, 9.0, 3.5, 4.2),
v4 = c("z1", "z2", "z3", "z4", "z5"))
df2 <- data_frame(v1 = c(500, 500, 500, 500, 500),
v2 = c(40, 40, 20, 40, 40),
v3 = c(5.1, 7.9, 8.6, 3.7, 4.0),
v4 = c("m1", "m2", "m3", "m4", "m5"))
df1 %>%
full_join(df2, by = c("v1", "v2")) %>% # join on v1 and v2
filter(near(v3.x, v3.y, tol = 0.21)) %>% # filter with a tolerance
rename(v3 = v3.x, v4 = v4.x) %>% # rename the columns
select(v1:v4) # select em
This yields
# A tibble: 3 x 4
v1 v2 v3 v4
<dbl> <dbl> <dbl> <chr>
1 500. 40. 5.20 z1
2 500. 40. 3.50 z4
3 500. 40. 4.20 z5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With