R - count matches between characters of one string and another, no replacement

Question

I have a keyword (e.g. 'green') and some text ("I do not like them Sam I Am!").

I'd like to see how many of the characters in the keyword ('g', 'r', 'e', 'e', 'n') occur in the text (in any order).

In this example the answer is 3 - the text doesn't have a G or R but has two Es and an N.

My problem arises where if a character in the text is matched with a character in the keyword, then it can't be used to match a different character in the keyword.

For example, if my keyword was 'greeen', the number of "matching characters" is still 3 (one N and two Es) because there are only two Es in the text, not 3 (to match the third E in the keyword).

How can I write this in R? This is just ticking something at the edge of my memory - I feel like it's a common problem but just worded differently (sort of like sampling with no replacement, but "matches with no replacement"?).

E.g.

keyword <- strsplit('greeen', '')[[1]]
text <- strsplit('idonotlikethemsamiam', '')[[1]]
# how many characters in keyword have matches in text,
# with no replacement?
# Attempt 1: sum(keyword %in% text)
# PROBLEM: returns 4 (all three Es match, but only two in text)

More examples of expected input/outputs (keyword, text, expected output):

'green', 'idonotlikethemsamiam', 3 (G, E, E)
'greeen', 'idonotlikethemsamiam', 3 (G, E, E)
'red', 'idonotlikethemsamiam', 2 (E and D)

N8TRO · Accepted Answer

The function pmatch() is great for this. Though it would be instinctual to use length here, length has no na.rm option. So to work around this nuisance, sum(!is.na()) is used.

keyword <- unlist(strsplit('greeen', ''))
text <- unlist(strsplit('idonotlikethemsamiam', ''))

sum(!is.na(pmatch(keyword, text)))

# [1] 3

keyword2 <- unlist(strsplit("red", ''))
sum(!is.na(pmatch(keyword2, text)))

# [1] 2

R - count matches between characters of one string and another, no replacement

Tags:

r

mathematical.coffee

1 Answers

N8TRO

Recent Activity

Donate For Us

R - count matches between characters of one string and another, no replacement

Tags:

r

mathematical.coffee

1 Answers

N8TRO

Related questions

Recent Activity

Donate For Us