Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I interpret rpart splits on factor variables when building classification trees in R?

If the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart tree is labeled as "Climate:ab", what is the split?

like image 434
user281537 Avatar asked Apr 08 '10 02:04

user281537


1 Answers

I assume you use standard way to plot tree which is

plot(f)
text(f)

As you can read in help to text.rpart, argument pretty on default factor variables are presented as letters, so a means levels(Climate)[1] and it means that on left node are observation with Climate==levels(Climate)[1] and on right the others.

You could print levels directly using

plot(f)
text(f, pretty=1)

Created by rpart

but I recommend using draw.tree from maptree package:

require(maptree)
draw.tree(f)

Created by maptree

I used fake data to do plots:

X <- data.frame(
    y=rep(1:4,25),
    Climate=rep(c("Tropical", "Arid", "Temperate", "Snow"),25)
)
f <- rpart(y~Climate, X)
like image 105
Marek Avatar answered Sep 21 '22 02:09

Marek