I have two vectors in R. I want to find partial matches between them.
The first one is from a dataset named muc, which contains 6400 street names. muc$name looks like:
muc$name = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße",...)
The other vector is d_vector. It contains around 1400 names.
d_vector = "Abel", "Abendroth", "von Abercron", "Abetz", "Abicht", "Abromeit", ...
I want to find all the street names, that contain a name from d_vector somewhere in the street name.
First, I made some general adaptions after importing the csv data (as variable d):
d_vector <- unlist(d$name)
d_vector <- as.vector(as.matrix(d_vector))
result <- unique(grep(paste(d_vector, collapse="|"), muc$Name, value=TRUE, ignore.case = TRUE))
result
But the result returns all the street names.
I also tried to use agrep, which retuned a Out of memory
-Error.
When I tried d_vector %in% muc$name
it returned just one TRUE and hundreds of FALSE, which doesn't seem right.
Do you have any suggestion where my mistake could lay or which library I could use? I am looking for something like python's "fuzzywuzzy" for R
Find positions of Matching Elements between Vectors in R Programming – match() Function. match() function in R Language is used to return the positions of the first match of the elements of the first vector in the second vector. If the element is not found, it returns NA.
setequal() function in R Language is used to check if two objects are equal. This function takes two objects like Vectors, dataframes, etc. as arguments and results in TRUE or FALSE, if the Objects are equal or not.
intersect() function in R Language is used to find the intersection of two Objects. This function takes two objects like Vectors, dataframes, etc. as arguments and results in a third object with the common data of both the objects.
In principle, your solution works fine with some dummy data:
streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen",
"Konrad-Adenauer-Platz", "anotherThing")
patterns = c("weg", "platz")
unique(grep(paste(patterns, collapse="|"), streets, value=TRUE, ignore.case = TRUE))
[1] "Berberichweg" "Otto-Klemperer-Weg" "Konrad-Adenauer-Platz"
I think something is not quite in place for the d_vector
. Try to check class(d_vector)
, or dput(d_vector)
and paste that here.
You can also try using sapply
and see if that will work:
matches =sapply(patterns, function(p) grep(p, streets, value=TRUE, ignore.case = TRUE))
# $weg
# [1] "Berberichweg" "Otto-Klemperer-Weg"
#
# $platz
# [1] "Konrad-Adenauer-Platz"
unique(unlist(matches))
# [1] "Berberichweg" "Otto-Klemperer-Weg" "Konrad-Adenauer-Platz"
Simple solution:
streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße")
streets = tolower(streets) #Lowercase all
names = c("Berber", "Weg")
names = tolower(names)
sapply(names, function (y) sapply(streets, function (x) grepl(y, x)))
# berber weg
#berberichweg TRUE TRUE
#otto-klemperer-weg FALSE TRUE
#feldmeierbogen FALSE FALSE
#altostraße FALSE FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With