Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr select using logical

Tags:

r

dplyr

Can select in dplyr be used with a logical vector?

dat <- tbl_df(mtcars)
isNum <- sapply(dat, is.numeric)
select(dat, isNum)
select(dat, isNum)

Error in names(sel)[unnamed] <- sel[unnamed] : NAs are not allowed in subscripted assignments

Indices work: select(dat,(1:ncol(dat))[isNum]) so why not a logical?

When I saw helper functions for select like starts_with select(dat,starts_with("m")) I assumed they would work with a logical ...

like image 855
Vincent Avatar asked Nov 11 '14 21:11

Vincent


People also ask

How do you select two variables in R?

You can shift-click to select a range of variables, you can hold shift and press the down key to select one or more variables, and so on.

What does dplyr select do?

select() is a function from dplyr R package that is used to select data frame variables by name, by index, and also is used to rename variables while selecting, and dropping variables by name.

How do I select a column by name in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


3 Answers

As Ben suggested:

select(dat, which(isNum))

like image 138
Vincent Avatar answered Oct 07 '22 21:10

Vincent


My answers would be:

  • no ("Can select in dplyr be used with a logical vector?")

evidence: (1) your example, (2) the help page:

...: Comma separated list of unquoted expressions. You can treat variable names like they are positions. Use positive values to select variables; use negative values to drop variables.

Doesn't say anything about logical vectors. Sorry.

  • I don't know ("why not a logical?") -- 'just because' (I don't think anyone but the developer could really answer this). You could put in a feature request ...

It's a little clunky, but

select_(dat,.dots=names(isNum)[isNum])

works (note that you need the select_ variant to allow using a character vector). But good old-fashioned

subset(dat,select=isNum)

seems to work fine too (unless it fails to play nicely with dplyr in some other way I haven't thought of).

If you look at the code of dplyr:::starts_with, you can see that it returns a vector of positions, not a logical vector

function (vars, match, ignore.case = TRUE) 
{
    stopifnot(is.string(match), !is.na(match), nchar(match) > 
        0)
    if (ignore.case) 
        match <- tolower(match)
    n <- nchar(match)
    if (ignore.case) 
        vars <- tolower(vars)
    which(substr(vars, 1, n) == match)
}

I was going to suggest that you try to modify this function to create an is_numeric equivalent, but I don't understand the underlying magic sufficiently well ...

like image 28
Ben Bolker Avatar answered Oct 07 '22 23:10

Ben Bolker


As stated very clearly in other answers, the response to your specific question is no. You cannot use a logical vector in dplyr::select().

However, in more recent versions of dplyr (v>=0.5.0) there is a new function that supports the use of a predicate function to be applied to the columns or a logical vector : select_if().

Using select_if with a predicate function, your example could be simplified as follows:

tbl_df(mtcars) %>% dplyr::select_if(is.numeric)

But, you can also use select_if with a logical vector. This more directly addresses your use case above, which would look like the following:

dat <- tbl_df(mtcars)
isNum <- sapply(dat, is.numeric)
select_if(dat, isNum)
like image 44
jackinovik Avatar answered Oct 07 '22 21:10

jackinovik