Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in running randomForest : object not found

So i am trying to fit a random forest classifier for my dataset. I am very new to R and i imagine this is a simple formatting issue.

I read in a text file and transform my dataset so it is of this format: (taking out confidential info)

>head(df.train,2)

   GOLGA8A     ITPR3   GPR174  SNORA63    GIMAP8     LEF1    PDE4B LOC100507043    TGFB1I1    SPINT1
Sample1  3.726046 3.4013711 3.794364 4.265287 -1.514573 7.725775 2.162616    -1.514573 -1.5145732 -1.514573
Sample2 4.262779 0.9261892 4.744096 7.276971 -1.514573 4.694769 4.707387     2.031476 -0.8325444  2.615991
...
...
CD8B     FECH    PYCR1 MGC12916     KCNA3 resp
Sample1  -1.514573 2.099336 3.427928 1.542951 -1.514573    1
Sample2 -1.145806 1.204241 2.846832 1.523808  1.616791    1

In essence the columns are my features and the rows my samples, the last column is my response vector which is a column of factors, resp.

Then i use:

set.seed(1) #Set the seed in order to gain reproducibility

RF1 = randomForest(resp~., data=df.train,ntree=1000,importance=T,mtry=3)

Simply trying to train the RF on my column resp using the other columns as features.

But I obtain the error:

Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found

However, looking into my training set I can clearly find that column, e.g with:

sort(unique(colnames(df.train))

So I don't really understand the error or where to go from here. My apologies if I haven't posed the question in the correct way, thanks for any and all help!

like image 562
AHawks Avatar asked Jan 29 '16 00:01

AHawks


1 Answers

I would suspect this comes from having an illegal variable name in your data frame. Let's consider a data frame that just has a response variable resp and a variable (illegally) named PCNA-AS1:

(dat <- structure(list(`PCNA-AS1` = c(1, 2, 3), resp = structure(c(2L, 2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("PCNA-AS1", "resp"), row.names = c(NA, -3L), class = "data.frame"))
#   PCNA-AS1 resp
# 1        1    1
# 2        2    1
# 3        3    0

Now when we train a random forest we get the indicated error:

library(randomForest)
mod <- randomForest(resp~., data=dat)
# Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found

A natural solution to this problem would be converting your variable names to all be legal:

names(dat) <- make.names(names(dat))
dat
#   PCNA.AS1 resp
# 1        1    1
# 2        2    1
# 3        3    0
mod <- randomForest(resp~., data=dat)

Now the model trains with no error.

like image 125
josliber Avatar answered Nov 19 '22 09:11

josliber