From what I read in ?match()
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
Why do I get a different result using match(x, dict[["word"]], 0L)
vapply(strsplit(df$text, " "),
function(x) sum(dict[["score"]][match(x, dict[["word"]], 0L)]), 1)
#[1] 2 -2 3 -2
Versus when using dict[["word"]] %in% x
vapply(strsplit(df$text, " "),
function(x) sum(dict[["score"]][dict[["word"]] %in% x]), 1)
#[1] 2 -2 1 -1
Data
library(dplyr)
df <- data_frame(text = c("I love pandas", "I hate monkeys",
"pandas pandas pandas", "monkeys monkeys"))
dict <- data_frame(word = c("love", "hate", "pandas", "monkeys"),
score = c(1,-1,1,-1))
Update
After Richard's explanation, I now understand my initial misconception. The %in%
operator returns a logical vector:
> sapply(strsplit(df$text, " "), function(x) dict[["word"]] %in% x)
[,1] [,2] [,3] [,4]
[1,] TRUE FALSE FALSE FALSE
[2,] FALSE TRUE FALSE FALSE
[3,] TRUE FALSE TRUE FALSE
[4,] FALSE TRUE FALSE TRUE
And match()
returns location numbers:
> sapply(strsplit(df$text, " "), function(x) match(x, dict[["word"]], 0L))
[[1]]
[1] 0 1 3
[[2]]
[1] 0 2 4
[[3]]
[1] 3 3 3
[[4]]
[1] 4 4
match()
returns an integer vector of positions for the first match, which will be greater 1 if that position is not the first.
%in%
returns a logical vector where a match (TRUE) is always 1 (when represented as an integer).
Hence, the sums in your calculations will likely differ.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With