Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr filter not masking base filter? [duplicate]

Tags:

r

dplyr

At work, I have a Windows 7 computer running R 3.1.2.

I have a file called packages.R. In my this file, I have the following code:

library(dplyr)
library(sqlutils)
library(RODBC)

My .Rprofile contains a function called .First.

.First <- function() {
    source("R/packages.R")
}

When I load R, I get the following output:

Loading required package: roxygen2
Loading required package: stringr
Loading required package: DBI

Attaching package: 'dplyr'

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

If you look at this carefully, you will see the filter from stats is not masked.

But, if I take my exact same setup, and comment out the library(dplyr) statement in packages.R, save the file, and restart R and then manually . . . . as in type it in by hand . . . .

library(dplyr)

Attaching package: 'dplyr'

The following object is masked from 'package:stats':

    filter

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Now, it masks package::stats.

I don't get it. I need to use the filter command from dplyr a lot for this project and I don't want to type dplyr::filter in order to use it. Could someone please help my weak mind understand why this is behaving this way? I have tried starting R in RStudio and ESS, and I get the exact same behavior in both. I also tried moving dplyr to the end of the packages.R file, with no difference to the results. I just want to mask stats::filter. Thanks.

like image 501
Choens Avatar asked Nov 14 '14 16:11

Choens


1 Answers

When you load libraries in .RProfile they get attached very early in the R startup process, before the stats package is attached. The other way, you're attaching dplyr after stats has already been loaded. You can learn about R's startup process by typing ?Startup. There it says:

Note that when the site and user profile files are sourced only the base package is loaded, so objects in other packages need to be referred to by e.g. utils::dump.frames or after explicitly loading the package concerned.

I've seen Hadley recommend against loading packages in .RProfile for this reason, i.e. the discrepancies in package loading order, although personally I don't have strong feelings about it.

One possible solution is to simply add library(stats) as the very first library call in your script, before loading dplyr.

Another (long term) option to avoid these sorts of issues more globally would be to transition your workflows from "a large collection of scripts" to one or more packages.

like image 144
joran Avatar answered Nov 16 '22 12:11

joran