Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why subset doesn't mind missing subset argument for dataframes?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.

Let

numbers <- c(1, 2, 3)
frame <- as.data.frame(numbers)

If I type

subset(numbers, )

(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):

Error in subset.default(numbers, ) :
argument "subset" is missing, with no default

However when I type

subset(frame,)

(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.

What is going on here? Why don't I get my well deserved error message?

like image 947
Vincent Avatar asked Jan 05 '23 10:01

Vincent


1 Answers

tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.


R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.

To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.

As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.

The methods of a generic function are encoded as

< generic function name >.< class name >

and can be found using methods(<generic function name>). For subset, we get

methods(subset)
[1] subset.data.frame subset.default    subset.matrix    
see '?methods' for accessing help and source code

which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:

subset.data.frame
function (x, subset, select, drop = FALSE, ...) 
{
    r <- if (missing(subset)) 
        rep_len(TRUE, nrow(x))
    else {
        e <- substitute(subset)
        r <- eval(e, x, parent.frame())
        if (!is.logical(r)) 
            stop("'subset' must be logical")
        r & !is.na(r)
    }
    vars <- if (missing(select)) 
        TRUE
    else {
        nl <- as.list(seq_along(x))
        names(nl) <- names(x)
        eval(substitute(select), nl, parent.frame())
    }
    x[r, vars, drop = drop]
}

Note that if the subset argument is missing, the first lines

    r <- if (missing(subset)) 
        rep_len(TRUE, nrow(x))

produce a vector of TRUES of the same length as the data.frame, and the last line

    x[r, vars, drop = drop]

feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.

As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error

Error in subset.default(numbers, )

that when you apply subset to a vector, R calls the subset.default method which is defined as

subset.default
function (x, subset, ...) 
{
    if (!is.logical(subset)) 
        stop("'subset' must be logical")
    x[subset & !is.na(subset)]
}

The subset.default function throws an error with stop when the subset argument is missing.

like image 130
lmo Avatar answered Jan 08 '23 09:01

lmo