Matching columns in 2 data frames when numbers don't exactly match

Question

How do I match two different data frames when the values I am comparing are not exactly the same?

I was thinking of using merge() but I am not sure.

Table1:

ID           Value.1
10001        x
18273-9      y
12824/5/6/7  z
10283/5/9    d

Table2:

ID           Value.2
10001        a
18274        b
12826        c
10289        u

How do I merge Table 1 and 2 based on ID?

Which specific function of fuzzyjoin package would I use, especially with the "/" & "-" cases? How do I expand the "-" case from 18273-9 so that R will register 18273 / 18274 / 18275 / ...?

Patrik_P · Accepted Answer

You can write a function to extract the corresponding sequences from the strings containing "/" or "-" and recombine them into a new data.frame as follows:

df1 <- data.frame(ID=c("10001","18273-9","15273-8", "15170-4",  "12824/5/6/7","10283/5/9"), 
                  value=c("a","c","c", "d","k", "l"), stringsAsFactors = F)

df2 <- data.frame(ID=c("10001","18274","12826","10289"), 
                  value=c("o","p","q","r"), stringsAsFactors = F)

doIt <- function(df){
  listAsDF <- function(l) {
    x <- stack(setNames(l, temp$value))
    names(x) <- c("ID", "value")
    return(x)
  }
  Base <- df[!grepl("\/", df$ID) & !grepl("\-", df$ID), ]
  #1 cases when - present
  temp <- df[grep("\-", df$ID),]
  temp <- listAsDF(lapply(strsplit(temp$ID, "-"), function(e) seq(e[1], paste0(strtrim(e[1], nchar(e[1])-1), e[2]), 1)))
  Base <- rbind(Base, temp)
  #2 cases when / present
  temp <- df[grep("\/", df$ID),]
  temp <- listAsDF(lapply(strsplit(temp$ID, "/"), function(a) c(a[1], paste0(strtrim(a[1], nchar(a[1])-1), a[-1]))))
  Base <- rbind(Base, temp)
  return(Base)
}

Then you can mergge the df2 and df1:

merge(doIt(df1), df2, by = "ID", all.x = T)

Hope this helps!

Matching columns in 2 data frames when numbers don't exactly match

Tags:

database

r

chu-js

1 Answers

Patrik_P

Recent Activity

Donate For Us

Matching columns in 2 data frames when numbers don't exactly match

Tags:

database

r

chu-js

1 Answers

Patrik_P

Related questions

Recent Activity

Donate For Us