I'd like to join two data frames if the seed
column in data frame y
is a partial match on the string
column in x
. This example should illustrate:
# What I have x <- data.frame(idX=1:3, string=c("Motorcycle", "TractorTrailer", "Sailboat")) y <- data_frame(idY=letters[1:3], seed=c("ractor", "otorcy", "irplan")) x idX string 1 1 Motorcycle 2 2 TractorTrailer 3 3 Sailboat y Source: local data frame [3 x 2] idY seed (chr) (chr) 1 a ractor 2 b otorcy 3 c irplan # What I want want <- data.frame(idX=c(1,2), idY=c("b", "a"), string=c("Motorcycle", "TractorTrailer"), seed=c("otorcy", "ractor")) want idX idY string seed 1 1 b Motorcycle otorcy 2 2 a TractorTrailer ractor
That is, something like
inner_join(x, y, by=stringr::str_detect(x$string, y$seed))
Figure 3: dplyr left_join Function. The difference to the inner_join function is that left_join retains all rows of the data table, which is inserted first into the function (i.e. the X-data). Have a look at the R documentation for a precise definition: Example 3: right_join dplyr R Function
Figure 7: dplyr anti_join Function. As you can see, the anti_join functions keeps only rows that are non-existent in the right-hand data AND keeps only columns of the left-hand data. The R help documentation of anti join is shown below: At this point you have learned the basic principles of the six dplyr join functions.
This Example explains how to extract rows with a partial match using the stringr package. We first need to install and load the stringr package: Now we can subset our data with the str_detect function as shown below: As you can see, we have extracted only rows where the Species column partially matches the character string “virg”.
The difference to the inner_join function is that left_join retains all rows of the data table, which is inserted first into the function (i.e. the X-data). Have a look at the R documentation for a precise definition: Right join is the reversed brother of left join:
The fuzzyjoin
library has two functions regex_inner_join
and fuzzy_inner_join
that allow you to match partial strings:
x <- data.frame(idX=1:3, string=c("Motorcycle", "TractorTrailer", "Sailboat")) y <- data.frame(idY=letters[1:3], seed=c("ractor", "otorcy", "irplan")) x$string = as.character(x$string) y$seed = as.character(y$seed) library(fuzzyjoin) x %>% regex_inner_join(y, by = c(string = "seed")) idX string idY seed 1 1 Motorcycle b otorcy 2 2 TractorTrailer a ractor library(stringr) x %>% fuzzy_inner_join(y, by = c("string" = "seed"), match_fun = str_detect) idX string idY seed 1 1 Motorcycle b otorcy 2 2 TractorTrailer a ractor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With