fuzzy join with stringdist_join() in R, Error: NAs are not allowed in subscripted assignments

Question

First of all I am sorry if my formatting is bad, this is my first time posting, (also new to programming & R)

I am trying to merge two data frames together on string variables. I am merging university names, which might not match up perfectly, so I was hoping to merge using a fuzzy or approximate string matching function. I was happy when I found the ‘fuzzyjoin’ package.

from cranR: stringdist_join: Join two tables based on fuzzy string matching of their columns

stringdist_join(x, y, by = NULL, max_dist = 2, method = c("osa", "lv",
  "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw","soundex"), mode = "inner", ignore_case = FALSE, distance_col = NULL, ...)

my code:

stringdist_left_join(new, institutions, by = c("tm_9_undergradu" = "Institution.Name"))

Error:

Error in dists[include] <- stringdist::stringdist(v1[include], v2[include],  : 
NAs are not allowed in subscripted assignments

I know that there are some NA's in these columns, but I am not sure how I could remove them as I need them there as well. I know it other join & merge functions the NA's will simply be ignored. Does anyone know a way to get around this error for this package or to do an approximate join on strings another way. Thank you for your help.

Luke Holcomb · Accepted Answer

This answer worked for me and is from GitHub

Step 1: figure out which Df has the NAs

`which(is.na(df1))
 which(is.na(df2))`

Step 2: replace NAs with something else. df1[is.na(df1)] <- "empty_string"

Step 3: run the join (the code I was working with when I got the error)

`test1 <- msa_table %>%
   as_tibble() %>% 
   unlist() %>%
   mutate(msa = sub("\(.*)","", as.character(msa)) %>% 
   stringdist_full_join(msa_table, df1, by = 'msa', max_dist = 2)`

The result for me was not having the same error, but still having NAs in my tables.

Hope this helps! Also, to be clear: this solution came from Anton Prokopyev '@prokopyev' on GitHub.

fuzzy join with stringdist_join() in R, Error: NAs are not allowed in subscripted assignments

Tags:

merge

r

dplyr

fuzzy-comparison

fuzzyjoin

Brian

1 Answers

Luke Holcomb

Recent Activity

Donate For Us

fuzzy join with stringdist_join() in R, Error: NAs are not allowed in subscripted assignments

Tags:

merge

r

dplyr

fuzzy-comparison

fuzzyjoin

Brian

1 Answers

Luke Holcomb

Related questions

Recent Activity

Donate For Us