Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fuzzy join with stringdist_join() in R, Error: NAs are not allowed in subscripted assignments

First of all I am sorry if my formatting is bad, this is my first time posting, (also new to programming & R)

I am trying to merge two data frames together on string variables. I am merging university names, which might not match up perfectly, so I was hoping to merge using a fuzzy or approximate string matching function. I was happy when I found the ‘fuzzyjoin’ package.

from cranR: stringdist_join: Join two tables based on fuzzy string matching of their columns

stringdist_join(x, y, by = NULL, max_dist = 2, method = c("osa", "lv",
  "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw","soundex"), mode = "inner", ignore_case = FALSE, distance_col = NULL, ...)

my code:

stringdist_left_join(new, institutions, by = c("tm_9_undergradu" = "Institution.Name"))

Error:

Error in dists[include] <- stringdist::stringdist(v1[include], v2[include],  : 
NAs are not allowed in subscripted assignments

I know that there are some NA's in these columns, but I am not sure how I could remove them as I need them there as well. I know it other join & merge functions the NA's will simply be ignored. Does anyone know a way to get around this error for this package or to do an approximate join on strings another way. Thank you for your help.

like image 977
Brian Avatar asked Nov 01 '18 21:11

Brian


1 Answers

This answer worked for me and is from GitHub

Step 1: figure out which Df has the NAs

`which(is.na(df1))
 which(is.na(df2))`

Step 2: replace NAs with something else. df1[is.na(df1)] <- "empty_string"

Step 3: run the join (the code I was working with when I got the error)

`test1 <- msa_table %>%
   as_tibble() %>% 
   unlist() %>%
   mutate(msa = sub("\\(.*)","", as.character(msa)) %>% 
   stringdist_full_join(msa_table, df1, by = 'msa', max_dist = 2)` 

The result for me was not having the same error, but still having NAs in my tables.

Hope this helps! Also, to be clear: this solution came from Anton Prokopyev '@prokopyev' on GitHub.

like image 195
Luke Holcomb Avatar answered Oct 12 '22 18:10

Luke Holcomb