Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regression tree in R

I am having trouble making a regression tree in R. I have a data frame with 17 attributes

library(rpart)
rt.model <- rpart(razlika ~ ., learn)

I get an error:

Error in `[.data.frame`(frame, predictors) : undefined columns selected

Seems weird because I did something like that with a very simillar database. You can dowload the dataframe on http://uploading.com/files/de8a966d/exa.Rda/ - then load with

load("exa.Rda")
like image 968
Borut Flis Avatar asked Dec 05 '11 16:12

Borut Flis


People also ask

What does a regression tree do?

A regression tree is basically a decision tree that is used for the task of regression which can be used to predict continuous valued outputs instead of discrete outputs.

What is the name of the R package that we use to run regression trees?

In R programming, rpart() function is present in rpart package. Using the rpart() function, decision trees can be built in R.

What is the difference between classification tree and regression tree?

The primary difference between classification and regression decision trees is that, the classification decision trees are built with unordered values with dependent variables. The regression decision trees take ordered values with continuous values.


3 Answers

The problem is not, I believe, that you have a matrix rather than a data frame. When I download and then load you data set, I get a data frame, not a matrix.

The problem is that you have bad characters in the column names. Use gsub to remove the characters "-", " ", "(" and ")" from the column names. Or you can simply redefine the column names yourself entirely using colnames.

Or do as ulvund does and simply call data.frame, which forces R to do the column name cleaning for you, by default.

When I do this, rpart runs just fine.

like image 129
joran Avatar answered Sep 21 '22 16:09

joran


Turn your learn matrix into a data frame.

Example:

load("exa.Rda")
library(rpart)
learn <- data.frame(learn)
rt.model <- rpart(razlika ~ ., learn)
rt.model

yields:

n= 226 

node), split, n, deviance, yval
      * denotes terminal node

  1) root 226 31417.5100   3.3849560  
    2) B.reb>=40.80799 117 12661.2300   0.4871795  
      4) B.ft>=0.7666193 31  2685.4190  -5.7741940  
        8) A.fg2< 0.4645683 22  1846.7730  -8.3181820  
         16) A.ft< 0.7464692 7   365.4286 -14.2857100 *
         17) A.ft>=0.7464692 15  1115.7330  -5.5333330 *
        9) A.fg2>=0.4645683 9   348.2222   0.4444444 *
      5) B.ft< 0.7666193 86  8322.3720   2.7441860  
       10) B.avg.conceded.< 98.19592 76  7255.6320   1.7105260  
         20) A.reb< 39.29941 19  1520.6320  -3.5789470 *
         21) A.reb>=39.29941 57  5026.2110   3.4736840  
           42) A.3pt< 0.3945418 35  2500.1710   0.7714286  
             84) A.ft< 0.7460665 17  1270.2350  -2.4705880 *
             85) A.ft>=0.7460665 18   882.5000   3.8333330 *
           43) A.3pt>=0.3945418 22  1863.8640   7.7727270  
             86) B.ft>=0.7214165 13   718.9231   4.0769230 *
             87) B.ft< 0.7214165 9   710.8889  13.1111100 *
       11) B.avg.conceded.>=98.19592 10   368.4000  10.6000000 *
    3) B.reb< 40.80799 109 16719.2500   6.4954130  
      6) A.fouls>=24.51786 23  2349.9130  -2.2173910  
       12) A.fg2< 0.4551468 16  1266.0000  -5.5000000 *
       13) A.fg2>=0.4551468 7   517.4286   5.2857140 *
      7) A.fouls< 24.51786 86 12156.3800   8.8255810  
       14) B.fouls< 22.80863 24  3271.9580   2.5416670  
         28) A.3pt< 0.3738479 9   626.0000  -6.0000000 *
         29) A.3pt>=0.3738479 15  1595.3330   7.6666670 *
       15) B.fouls>=22.80863 62  7569.8710  11.2580600  
         30) A.fouls< 22.32999 18  1650.5000   5.5000000 *
         31) A.fouls>=22.32999 44  5078.4320  13.6136400  
           62) A.ft.drawn>=29.18849 7   208.8571   3.8571430 *
           63) A.ft.drawn< 29.18849 37  4077.1890  15.4594600  
            126) A.fg2< 0.4588535 18  1696.5000  11.5000000 *
            127) A.fg2>=0.4588535 19  1831.1580  19.2105300 *
like image 25
abcde123483 Avatar answered Sep 20 '22 16:09

abcde123483


This can also happen if column names are integers (1:N), even though they are stored as characters.

like image 36
Chris Avatar answered Sep 21 '22 16:09

Chris