I need to call a custom function to do some calculation. In this function, there is one if statement to check the input values. But my codes don't return values I expected.
Created a test data.frame
library(dplyr)
df <- expand.grid(x = 2:4, y = 2:4, z = 2:4)
df$value <- df$x
df <- df%>% tbl_df %>% group_by(x, y)
test_fun1 just return sum of all values
test_fun1 <- function(value)
{
return(sum(value))
}
df %>% summarize(t = test_fun1(value))
test_fun1 return results as my expected
Source: local data frame [4 x 3]
Groups: x
x y t
1 1 1 2
2 1 2 2
3 2 1 4
4 2 2 4
Then I add a if statement to check whether all values equal.
test_fun2 <- function(value)
{
if (all(value == 2))
{
return (NA)
}
return(sum(value))
}
df %>% summarize(t = test_fun2(value))
But test_fun2 return TRUE for values are more than 2
Source: local data frame [9 x 3]
Groups: x
x y t
1 2 2 NA
2 2 3 NA
3 2 4 NA
4 3 2 TRUE
5 3 3 TRUE
6 3 4 TRUE
7 4 2 TRUE
8 4 3 TRUE
9 4 4 TRUE
Results are as expected for other values for test_fun3 for other values.
test_fun3 <- function(value)
{
if (all(value != 3))
{
return(sum(value))
}
return (NA)
}
df %>% summarize(t = test_fun3(value))
I could get the similar results for 4 or 5
Source: local data frame [9 x 3]
Groups: x
x y t
1 2 2 6
2 2 3 6
3 2 4 6
4 3 2 NA
5 3 3 NA
6 3 4 NA
7 4 2 12
8 4 3 12
9 4 4 12
In my real data, I got FALSE of non NA testing, but can not create a reproduce example here.
Any ideas about this problem? Thanks for any suggestions.
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.2
loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 magrittr_1.0.1 parallel_3.1.0
[4] Rcpp_0.11.1 tools_3.1.0
The problem is obviously, that mutate
tries to determine the class of the column from the first assignment and applies this class to all other groups. And the class of NA
is (in your case unfortunately) logical
. For more details you can have a look here https://github.com/hadley/dplyr/issues/299
I would suggest, that you work around this by assigning a casted NA
. See also ? NA
test_fun2 <- function(value) {
if (all(value == 2)) {
return (NA_integer_)
}
return(sum(value))
}
df %>% summarize(t = test_fun2(value))
Source: local data frame [9 x 3]
Groups: x
x y t
1 2 2 NA
2 2 3 NA
3 2 4 NA
4 3 2 9
5 3 3 9
6 3 4 9
7 4 2 12
8 4 3 12
9 4 4 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With