Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr::n() returns "Error: This function should not be called directly"

Tags:

r

dplyr

If I do:

dplyr::mutate(MeanValue = mean(RSSI), ReadCount = n())

everything works fine. But when I try to qualify the function:

dplyr::mutate(MeanValue = mean(RSSI), ReadCount = dplyr::n())

I get the error mentioned in the title.

So, I do not really have a problem, I can just avoid doing that, but I'm curious about why it even happens. I already looked at another question (dplyr: "Error in n(): function should not be called directly"), but as far as I know, dplyr is the only library I'm using. I tried doing what the answer suggests anyway, but

detach(package:plyr)

results in

Error in detach(package:plyr) : invalid 'name' argument and

conflicts()

does not mention n():

[1] "filter" "lag" "body<-" "intersect" "kronecker" "setdiff" "setequal" "union"
, most of which is cause by dplyr.

I guess I'm not the only one confused by this?

like image 660
Silverclaw Avatar asked Sep 03 '16 09:09

Silverclaw


3 Answers

So, I do not really have a problem, I can just avoid [writing dplyr::n()], but I'm curious about why it even happens.

Here's the source code for dplyr::n in dplyr 0.5.0:

function () {
    stop("This function should not be called directly")
}

That's why the fully qualified form raises this error: the function always returns an error. (My guess is that the error-throwing function dplyr::n exists so that n() could have a typical documentation page with examples.)

Inside of filter/mutate/summarise statements, n() is not calling this function. Instead, some internal function calculates the group sizes for the expression n(). That's why the following works when dplyr is not loaded:

n()
#> Error: could not find function "n"

library(magrittr)
iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(n = n())
#> # A tibble: 3 × 2
#>      Species     n
#>       <fctr> <int>
#> 1     setosa    50
#> 2 versicolor    50
#> 3  virginica    50

Here n() cannot be mapped to a function, so we get an error. But when used it inside of a dplyr verb, n() does map to something and returns group sizes.

like image 116
TJ Mahr Avatar answered Oct 31 '22 17:10

TJ Mahr


I think this is coming as a result of masking between plyr and dplyr. Anyhow this solves it:

dplyr::summarise(count = n())
like image 35
Niv Cohen Avatar answered Oct 31 '22 19:10

Niv Cohen


I know I am 2 years late, but here’s my take.

The grouping in dplyr doesn’t actually do anything to the data. It just notes it’s grouped. This means the functions like mean or n need to be aware of this, and must infer from their wider context they should perform their calculations groupwise. They aren’t reallu R functions, which aren’t aware of this context. They are basically symbols that summarise() or mutate() choose to evaluate in a certain way (means or counts per group). I think Hadley chose to show an error if you call n() directly, as that’s slightly better than not having a function implemented at all.

like image 23
Dan Houghton Avatar answered Oct 31 '22 19:10

Dan Houghton