I have two vectors in R. I want to find partial matches between them. <h3>My Data</h3> The first one is from a dataset named muc, which contains 6400 street names. muc$name looks like: <code>muc$name = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße",...)</code> The other vector is d_vector. It contains around 1400 names. <code>d_vector = "Abel", "Abendroth", "von Abercron", "Abetz", "Abicht", "Abromeit", ...</code> I want to find all the street names, that contain a name from d_vector somewhere in the street name. First, I made some general adaptions after importing the csv data (as variable d): <code>d_vector <- unlist(d$name) d_vector <- as.vector(as.matrix(d_vector))</code> <h3>What I tried so far</h3> <ul> <li>Then I tried to find a solution with grep, turning d_vector into containing one long string, separated by | for RegEx-Search:</li> </ul> <code>result <- unique(grep(paste(d_vector, collapse="|"), muc$Name, value=TRUE, ignore.case = TRUE)) result</code> But the result returns all the street names. <ul> <li>I also tried to use agrep, which retuned a <code>Out of memory</code>-Error.</li> <li>When I tried <code>d_vector %in% muc$name</code>it returned just one TRUE and hundreds of FALSE, which doesn't seem right.</li> </ul> Do you have any suggestion where my mistake could lay or which library I could use? I am looking for something like python's "fuzzywuzzy" for R

In principle, your solution works fine with some dummy data: <pre class="prettyprint"><code>streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen", "Konrad-Adenauer-Platz", "anotherThing") patterns = c("weg", "platz") unique(grep(paste(patterns, collapse="|"), streets, value=TRUE, ignore.case = TRUE)) [1] "Berberichweg" "Otto-Klemperer-Weg" "Konrad-Adenauer-Platz" </code></pre> I think something is not quite in place for the <code>d_vector</code>. Try to check <code>class(d_vector)</code>, or <code>dput(d_vector)</code> and paste that here. You can also try using <code>sapply</code> and see if that will work: <pre class="prettyprint"><code>matches =sapply(patterns, function(p) grep(p, streets, value=TRUE, ignore.case = TRUE)) # $weg # [1] "Berberichweg" "Otto-Klemperer-Weg" # # $platz # [1] "Konrad-Adenauer-Platz" unique(unlist(matches)) # [1] "Berberichweg" "Otto-Klemperer-Weg" "Konrad-Adenauer-Platz" </code></pre>

Find matching strings between two vectors in R

Q: How do you find the matching element between two vectors in R?

Find positions of Matching Elements between Vectors in R Programming – match() Function. match() function in R Language is used to return the positions of the first match of the elements of the first vector in the second vector. If the element is not found, it returns NA.

Q: How do you check if two vectors are equal in R?

setequal() function in R Language is used to check if two objects are equal. This function takes two objects like Vectors, dataframes, etc. as arguments and results in TRUE or FALSE, if the Objects are equal or not.

Q: How do you find the intersection of two vectors in R?

intersect() function in R Language is used to find the intersection of two Objects. This function takes two objects like Vectors, dataframes, etc. as arguments and results in a third object with the common data of both the objects.

My Data

The first one is from a dataset named muc, which contains 6400 street names. muc$name looks like:

muc$name = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße",...)

The other vector is d_vector. It contains around 1400 names.

d_vector = "Abel", "Abendroth", "von Abercron", "Abetz", "Abicht", "Abromeit", ...

I want to find all the street names, that contain a name from d_vector somewhere in the street name.

First, I made some general adaptions after importing the csv data (as variable d):

d_vector <- unlist(d$name) d_vector <- as.vector(as.matrix(d_vector))

What I tried so far

Then I tried to find a solution with grep, turning d_vector into containing one long string, separated by | for RegEx-Search:

result <- unique(grep(paste(d_vector, collapse="|"), muc$Name, value=TRUE, ignore.case = TRUE)) result

But the result returns all the street names.

I also tried to use agrep, which retuned a Out of memory-Error.
When I tried d_vector %in% muc$nameit returned just one TRUE and hundreds of FALSE, which doesn't seem right.

Do you have any suggestion where my mistake could lay or which library I could use? I am looking for something like python's "fuzzywuzzy" for R

834

asked Jul 14 '16 10:07

Benedict Witzenberger

2 Answers

In principle, your solution works fine with some dummy data:

streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen", 
            "Konrad-Adenauer-Platz", "anotherThing")
patterns = c("weg", "platz")

unique(grep(paste(patterns, collapse="|"), streets, value=TRUE, ignore.case = TRUE))
[1] "Berberichweg"          "Otto-Klemperer-Weg"    "Konrad-Adenauer-Platz"

I think something is not quite in place for the d_vector. Try to check class(d_vector), or dput(d_vector) and paste that here.

You can also try using sapply and see if that will work:

matches =sapply(patterns, function(p) grep(p, streets, value=TRUE, ignore.case = TRUE))
# $weg
# [1] "Berberichweg"       "Otto-Klemperer-Weg"
# 
# $platz
# [1] "Konrad-Adenauer-Platz"

unique(unlist(matches))
# [1] "Berberichweg"          "Otto-Klemperer-Weg"    "Konrad-Adenauer-Platz"

answered Oct 03 '22 11:10

Deena

Simple solution:

streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße")
streets = tolower(streets) #Lowercase all
names = c("Berber", "Weg")
names = tolower(names)

sapply(names, function (y) sapply(streets, function (x) grepl(y, x)))

#                   berber   weg
#berberichweg        TRUE  TRUE
#otto-klemperer-weg  FALSE TRUE
#feldmeierbogen      FALSE FALSE
#altostraße          FALSE FALSE

answered Oct 03 '22 11:10

catastrophic-failure

Related questions
                            
                                dynamic body in shiny dashboard
                            
                                Exporting PNG files from Plotly in R without internet
                            
                                Dynamic number of actionButtons tied to unique observeEvent
                            
                                R returning partial matching of row names
                            
                                Remove duplicated rows dependend on factor
                            
                                Skip all testthat tests when condition not met
                            
                                Melt and cast data table using pattern
                            
                                How to use R CMD Install without dependencies check?
                            
                                regex: "(^|)" vs "(|^)"
                            
                                [R]:RDCOMClient and outlook : how to send to multiple recipients or cc someone?
                            
                                Add "title" to my factors using facet_grid
                            
                                neuralnet in R: what are the difference between stepmax and rep parameters?
                            
                                Split headings in groups over multiple rows in xtable
                            
                                Specifying column class in html_table(rvest)
                            
                                Count number of rows when using dplyr to access sql table/query
                            
                                The output order of function calculate.overlap
                            
                                Install R & RTools from Windows terminal
                            
                                R: Convert upper triangular part of a matrix to symmetric matrix
                            
                                Pass variable to tidyr's gather to rename key/value columns?
                            
                                opening a plotly plot in a browser instead of a viewer in rstudio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find matching strings between two vectors in R

Tags:

string-matching

pattern-matching

r