Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine which column name is causing 'undefined columns selected' error when using subset()

Tags:

r

subset

I'm trying to subset a large data frame from a very large data frame, using

data.new <- subset(data, select = vector)

where vector is a character string containing the column names I'm trying to isolate. When I do this I get

Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected

Is there a way to identify which specific column name in the vector is undefined? Through trial and error I've narrowed it down to about 400, but that still doesn't help.

like image 516
dbertolatus Avatar asked Dec 10 '15 19:12

dbertolatus


1 Answers

Find the elements of your vector that are not %in% the names() of your data frame.

Working example:

dd <- data.frame(a=1,b=2)
subset(dd,select=c("a"))
##   a
## 1 1

Now try something that doesn't work:

v <- c("a","d")
subset(dd,select=v)
## Error in `[.data.frame`(x, r, vars, drop = drop) : 
##    undefined columns selected

v[!v %in% names(dd)]
## [1] "d"

Or

setdiff(v,names(dd))
## [1] "d"

The last few lines of the example code in ?match show a similar case.

like image 145
Ben Bolker Avatar answered Sep 19 '22 14:09

Ben Bolker