Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

match() versus %in% operator

Tags:

r

match

From what I read in ?match()

"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

Why do I get a different result using match(x, dict[["word"]], 0L)

vapply(strsplit(df$text, " "), 
   function(x) sum(dict[["score"]][match(x, dict[["word"]], 0L)]), 1)
#[1]  2 -2  3 -2

Versus when using dict[["word"]] %in% x

vapply(strsplit(df$text, " "), 
       function(x) sum(dict[["score"]][dict[["word"]] %in% x]), 1)
#[1]  2 -2  1 -1

Data

library(dplyr)
df <- data_frame(text = c("I love pandas", "I hate monkeys", 
                          "pandas pandas pandas", "monkeys monkeys"))
dict <- data_frame(word = c("love", "hate", "pandas", "monkeys"),
                   score = c(1,-1,1,-1))

Update

After Richard's explanation, I now understand my initial misconception. The %in% operator returns a logical vector:

> sapply(strsplit(df$text, " "), function(x) dict[["word"]] %in% x)
      [,1]  [,2]  [,3]  [,4]
[1,]  TRUE FALSE FALSE FALSE
[2,] FALSE  TRUE FALSE FALSE
[3,]  TRUE FALSE  TRUE FALSE
[4,] FALSE  TRUE FALSE  TRUE

And match() returns location numbers:

> sapply(strsplit(df$text, " "), function(x) match(x, dict[["word"]], 0L))
[[1]]
[1] 0 1 3

[[2]]
[1] 0 2 4

[[3]]
[1] 3 3 3

[[4]]
[1] 4 4
like image 462
Steven Beaupré Avatar asked Jan 23 '15 06:01

Steven Beaupré


1 Answers

match() returns an integer vector of positions for the first match, which will be greater 1 if that position is not the first.

%in% returns a logical vector where a match (TRUE) is always 1 (when represented as an integer).

Hence, the sums in your calculations will likely differ.

like image 133
Rich Scriven Avatar answered Oct 01 '22 06:10

Rich Scriven