I have created a function in R based on the kind help of @Jim M.
When i run the function i get the error: Error: unknown column 'rawdata' When looking at the debugger i get the message: Rcpp::exception in eval(expr, envir, enclos): unknown column 'rawdata'
However when i look at the environment window i can see 2 variables which I have passed to the function and they contain information rawdata with 7 level factors and refdata with 28 levels
function (refdata, rawdata)
{
wordlist <- expand.grid(rawdata = rawdata, refdata = refdata, stringsAsFactors = FALSE)
wordlist %>% group_by(rawdata) %>% mutate(match_score = jarowinkler(rawdata, refdata)) %>%
summarise(match = match_score[which.max(match_score)], matched_to = ref[which.max(match_score)])
}
This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.
First of all you need to know that ALL the standard dplyr
functions use NSE. Let's see an approximate example to your problem:
Data:
df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
> df
col1 col2
1 a 0.03366446
2 a 0.46698763
3 a 0.34114682
4 a 0.92125387
5 a 0.94511394
6 b 0.67241460
7 b 0.38168131
8 b 0.91107090
9 b 0.15342089
10 b 0.60751868
Let's see how NSE will make our simple problem crush:
First of all the simple interactive case works:
df %>% group_by(col1) %>% summarise(count = n())
Source: local data frame [2 x 2]
col1 count
1 a 5
2 b 5
Let's see what happens if I put it in a function:
lets_group <- function(column) {
df %>% group_by(column) %>% summarise(count = n())
}
>lets_group(col1)
Error: index out of bounds
Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.
Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr
that use standard evaluation. They are just the normal functions plus the _
underscore at the end.
Now look at how this will work:
#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
df %>% group_by_(column) %>% summarise_(count = ~n())
}
This yields the following result:
#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]
col1 count
1 a 5
2 b 5
I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With