I used R's max function in combination with the summarise
function from the dplyr
package and had a typo in the max
function's argument na.rm
.
Mistakenly I wrote ns.rm = T
and the script worked without any warning message and returned wrong values.
When replacing the na.rm
with ns.rm
on a simple vector (outside dplyr
environment), the function returns the right values, and if the input vector holds NA
then it returns an NA
value without any warning about wrong argument used.
Here is an example:
if(!require('magrittr')) install.packges('magrittr')
if(!require('dplyr')) install.packges('dplyr')
tab <- data.frame("grp1" = sort(rep(1:4, 5)),
"grp2" = rep(c(1:2), 10),
"val" = rnorm(20))
tab
grp1 grp2 val
1 1 1 0.03536351
2 1 2 1.04237251
3 1 1 0.82735937
4 1 2 0.29040424
5 1 1 0.30194926
6 2 2 -0.96649026
7 2 1 -0.97388257
8 2 2 -0.13111541
9 2 1 -0.48337864
10 2 2 -0.73471857
11 3 1 -0.88536656
12 3 2 -1.30442575
13 3 1 1.18816751
14 3 2 -0.90334058
15 3 1 -0.53102641
16 4 2 -0.69266762
17 4 1 -0.64776312
18 4 2 0.01354644
19 4 1 0.78058285
20 4 2 -0.06647959
>
### Using max function within dplyr
## Right way
tab %>%
group_by(grp1, grp2) %>%
summarise("max_val" = max(val, na.rm = T))
# A tibble: 8 x 3
# Groups: grp1 [4]
grp1 grp2 max_val
<int> <int> <dbl>
1 1 1 0.827
2 1 2 1.04
3 2 1 -0.483
4 2 2 -0.131
5 3 1 1.19
6 3 2 -0.903
7 4 1 0.781
8 4 2 0.0135
## with a typo in na.rm argument
tab %>%
group_by(grp1, grp2) %>%
summarise("max_val" = max(val, ns.rm = T))
# A tibble: 8 x 3
# Groups: grp1 [4]
grp1 grp2 max_val
<int> <int> <dbl>
1 1 1 1
2 1 2 1.04
3 2 1 1
4 2 2 1
5 3 1 1.19
6 3 2 1
7 4 1 1
8 4 2 1
### Using max function on a vector
max(c(1, 2, 3), ns.rm = T)
[1] 3
max(c(1, 2, 3), ns.rm = T)
[1] 3
max(c(1, 2, 3), na.rm = T)
[1] 3
max(c(1, 2, 3, NA), ns.rm = T)
[1] NA
max(c(1, 2, 3, NA), na.rm = T)
[1] 3
Does anybody know if ns.rm is a legitimate input argument of any R function? If not, why there is no warning that the argument used is not used appropriately?
No, ns.rm
is not a legitimate input argument but what is happening here is ns.rm = T
is considered as new input in the vector which is passed in max
where T
is considered as 1.
max(c(1, 2, 3), ns.rm = T)
#[1] 3
is actually interpreted as
max(c(1, 2, 3), 1)
#[1] 3
and
max(c(0.1, 0.2, 0.33), ns.rm = T)
#[1] 1
is interpreted as
max(c(0.1, 0.2, 0.33), 1)
and
max(c(1, 2, 3, NA), ns.rm = T)
#[1] NA
is actually
max(c(1, 2, 3, NA), 1)
#[1] NA
Similarly, for the dataframe
set.seed(123)
tab <- data.frame(grp1 = sort(rep(1:4, 5)),
grp2 = rep(c(1:2), 10),
val = rnorm(20))
By using the right way, we get numbers as
library(dplyr)
tab %>% group_by(grp1, grp2) %>% summarise(max_val = max(val, na.rm = T))
# grp1 grp2 max_val
# <int> <int> <dbl>
#1 1 1 1.56
#2 1 2 0.0705
#3 2 1 0.461
#4 2 2 1.72
#5 3 1 1.22
#6 3 2 0.360
#7 4 1 0.701
#8 4 2 1.79
Now if we use ns.rm = T
tab %>% group_by(grp1, grp2) %>% summarise(max_val = max(val, ns.rm = T))
# grp1 grp2 max_val
# <int> <int> <dbl>
#1 1 1 1.56
#2 1 2 1
#3 2 1 1
#4 2 2 1.72
#5 3 1 1.22
#6 3 2 1
#7 4 1 1
#8 4 2 1.79
Notice where max_val
was less than 1 in the above groups is now replaced with 1 while using ns.rm
since T
is interpreted as 1.
Also, note that this is not limited to ns.rm
only, you can use any character here.
max(c(0.1, 0.2, 0.33), a = T)
#[1] 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With