Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset dataframe with list of columns in R

I want to select all columns in my dataframe which I have stored in a string variable. For example:

v1 <- rnorm(100)
v2 <- rnorm(100)
v3 <- rnorm(100)
df <- data.frame(v1,v2,v3)

I want to accomplish the following:

df[,c('v1','v2')]

But I want to use a variable instead of (c('v1', 'v2'))(these all fail):

select.me <- "'v1','v2'"
df[,select.me]
df[,c(select.me)]
df[,c(paste(select.me,sep=''))]

Thanks for help with a simple question,

like image 738
mike Avatar asked Nov 30 '12 01:11

mike


People also ask

How do I subset a Dataframe in R by columns?

The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .

How do I select a list of columns in R?

To select columns in R you can use either R base df[] notation or select() function from dplyr package.

How do I select multiple columns from a Dataframe in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


1 Answers

The great irony here is that when you said "I want to do this" the first expression should have succeeded,

df[,c('v1','v2')]
> str( df[,c('v1','v2')] )
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...

whereas all the later attempts would fail. I later realized that you didn't know that you could use select.me <- c('v1','v2') ; df[ , select.me]. You could also use these forms which might be safer in some instances:

df[ , names(df) %in% select.me] # logical indexing
df[ , grep(select.me, names(df) ) ]  # numeric indexing
df[ , grepl(select.me, names(df) ) ]  # logical indexing

Any of those can be used with negation( !logical ) or minus ( -numeric) to retrieve the complement, whereas you cannot use character indexing with negation. If you wanted to go down one level in understandability and were willing to change the select.me values to a valid R expression you could do this:

select.me <- "c('v1','v2')"
df[ , eval(parse(text=select.me)) ]

Not that I recommend this... just to let you know that such is possible after you "learn to walk". It would also have been possible (although rather baroque) using your original quoted string to pull out the information (although I think this just illustrates why your first version is superior):

select.me <- "'v1','v2'"
df [ , scan(textConnection(select.me), what="", sep=",") ]
> str( df [ , scan(textConnection(select.me), what="", sep=",") ] )
Read 2 items
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...
like image 141
IRTFM Avatar answered Oct 13 '22 01:10

IRTFM