I have two data frames: <pre class="prettyprint"><code>df.1 <- data.frame(loc = c('A','B','C','C'), person = c(1,2,3,4), str = c("door / window / table", "window / table / toilet / vase ", "TV / remote / phone / window", "book / vase / car / chair")) </code></pre> Thus, <pre class="prettyprint"><code> loc person str 1 A 1 door / window / table 2 B 2 window / table / toilet / vase 3 C 3 TV / remote / phone / window 4 C 4 book / vase / car / chair </code></pre> And, <pre class="prettyprint"><code>df.2 <- data.frame(loc = c('A','B','C'), str = c("book / chair / chair", " table / remote / vase ", "window")) </code></pre> which gives, <pre class="prettyprint"><code> loc str 1 A book / chair / car 2 B table / remote / vase 3 C window </code></pre> I want to create a variable <code>df.1$percentage</code> that calculates the percentages of elements in <code>df.1$str</code> that are in <code>df.2$str</code> edit by loc, or: <pre class="prettyprint"><code> loc person str percentage 1 A 1 door / window / table 0.00 2 B 2 window / table / toilet / vase 0.50 3 C 3 TV / remote / phone / window 0.25 4 C 4 book / vase / car / chair 0.00 </code></pre> (<code>1</code> has 0/3, <code>2</code> has 2/4 matches, <code>3</code> has 1/4, and <code>4</code> has 0/4) Thanks!

As you might know, data.frame columns can also hold lists (see Create a data.frame where a column is a list). So you can split your <code>str</code> into lists of words: <pre class="prettyprint"><code>df.1 <- transform(df.1, words.1 = I(strsplit(as.character(str), " / "))) df.2 <- transform(df.2, words.2 = I(strsplit(as.character(str), " / "))) </code></pre> Then merge your data: <pre class="prettyprint"><code>m <- merge(df.1, df.2, by = "loc") </code></pre> And simply compute the percentage using <code>mapply</code>: <pre class="prettyprint"><code>transform(m, percentage = mapply(function(x, y) sum(x%in%y) / length(x), words.1, words.2)) </code></pre>

Count matches between two strings

I have two data frames:

df.1 <- data.frame(loc = c('A','B','C','C'), person = c(1,2,3,4), str = c("door / window / table", "window / table / toilet / vase ", "TV / remote / phone / window", "book / vase / car / chair"))

Thus,

  loc person                             str
1   A      1           door / window / table
2   B      2 window / table / toilet / vase 
3   C      3    TV / remote / phone / window
4   C      4       book / vase / car / chair

And,

df.2 <- data.frame(loc = c('A','B','C'), str = c("book / chair / chair", " table / remote / vase ", "window"))

which gives,

  loc                     str
1   A    book / chair / car
2   B  table / remote / vase 
3   C                  window

I want to create a variable df.1$percentage that calculates the percentages of elements in df.1$str that are in df.2$str edit by loc, or:

  loc person                             str percentage
1   A      1           door / window / table       0.00
2   B      2 window / table / toilet / vase        0.50
3   C      3    TV / remote / phone / window       0.25
4   C      4       book / vase / car / chair       0.00

(1 has 0/3, 2 has 2/4 matches, 3 has 1/4, and 4 has 0/4)

Thanks!

How do you count pairs in strings?

We need to use a hash table to store the count of all occurrences of a character.So we know if a character occurs twice, then it will have 4 pairs – (i, i), (j, j), (i, j), (j, i). So using a hash function, store the occurrence of each character, then for each character the number of pairs will be occurrence^2.

How do you match a character with two strings?

Approach: Initialize a counter variable with 0. Iterate over the first string from the starting character to ending character. If the character extracted from the first string is found in the second string, then increment the value of the counter by 1.

How do I count the same characters in two strings in Java?

Approach: Count the frequencies of all the characters from both strings. Now, for every character if the frequency of this character in string s1 is freq1 and in string s2 is freq2 then total valid pairs with this character will be min(freq1, freq2). The sum of this value for all the characters is the required answer.

Which function returns the number of matching characters of two string?

Which function returns the number of matching characters of two string? The strcmp() function is used to compare two strings two strings str1 and str2 . If two strings are same then strcmp() returns 0 , otherwise, it returns a non-zero value.

As you might know, data.frame columns can also hold lists (see Create a data.frame where a column is a list). So you can split your str into lists of words:

df.1 <- transform(df.1, words.1 = I(strsplit(as.character(str), " / ")))
df.2 <- transform(df.2, words.2 = I(strsplit(as.character(str), " / ")))

Then merge your data:

m <- merge(df.1, df.2, by = "loc")

And simply compute the percentage using mapply:

transform(m, percentage = mapply(function(x, y) sum(x%in%y) / length(x),
                                 words.1, words.2))

Someone can probably come up with a smarter solution, but here's a straightforward approach:

library(data.table)
dt1 = data.table(df.1, key = "loc") # set the key to match by loc
dt2 = data.table(df.2)

dt1[, percentage := dt1[dt2][, # merge
           # clean up spaces and convert to strings
           `:=`(str = gsub(" ", "", as.character(str)),
                str.1 = gsub(" ", "", as.character(str.1)))][,
           # calculate the percentage for each row
           lapply(1:.N, function(i) {
                tmp = strsplit(str, "/")[[i]];
                sum(tmp %in% strsplit(str.1, "/")[[i]])/length(tmp)
           })
   ]]

dt1
#   loc person                             str percentage
#1:   A      1           door / window / table          0
#2:   B      2 window / table / toilet / vase         0.5
#3:   C      3    TV / remote / phone / window       0.25
#4:   C      4       book / vase / car / chair          0

Count matches between two strings

Tags:

r

Lucarno

People also ask

2 Answers

flodel

eddi

Recent Activity

Donate For Us

Count matches between two strings

Tags:

r

Lucarno

People also ask

2 Answers

flodel

eddi

Related questions

Recent Activity

Donate For Us