I have trained some data with rpart and interested in labeling each observation with the tree terminal node, and link to the rule corresponding to that terminal node.
I have used the following code as example:
library(rpart)
library(rattle)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
table(fit$where)
rattle::asRules(fit)
I'm able to label each observation via fit$where, the labels are:
> table(fit$where)
3 5 7 8 9
29 12 14 7 19
first question: these labels does not correspond with the labels generated by rattle::asRules(fit), which are 3,23,22,10,4 how can I generate the mapping table between the two?
second question: asRules just prints while I would like to put the rules in a table and not standard output.
my expected results: a data frame with a mapping between fit$where and asRules labels and another column with the rule text as a string, e.g.:
Rule number: 4 [Kyphosis=absent cover=29 (36%) prob=0.00]
Start>=8.5
Start>=14.5
if we can parse the text to ID, statistics and condition in separate columns, even better but not mandatory.
I have found many related questions and links, but did not find a final answer.
thanks much, Kamashay
progress update 29/01
I'm able to extract each rule separately if I have the rule ID, via path.rpart:
>path.rpart(fit,node=22)
node number: 22
root
Start>=8.5
Start< 14.5
Age>=55
Age>=111
this gets me the rule as a list I can convert to a string. however the IDs are complaint with 'asRules' function and not 'fit$where'...
using "partykit" gets me the same results as "fit$where":
library("partykit")
> table(predict(as.party(fit), type = "node"))
3 5 7 8 9
29 12 14 7 19
so, I'm still not able to link between the two ( asRules IDs and fit$where IDs), I'm probably missing something fundamental, or there's a more straightforward way to do the task.
can you aid?
You can find the rule number (in fact the leaf node number) corresponding to each fit$where using
> row.names(fit$frame)[fit$where]
[1] "3" "22" "3" "3" "4" "4" ...
You might get a little closer to your desired output with
> rattle::asRules(fit, TRUE)
R 3 [23%,0.58] Start< 8.5
R 23 [ 9%,0.57] Start>=8.5 Start< 14.5 Age>=55 Age< 111
...
Did you mean something like this?
library(rpart)
library(rpart.utils)
library(dplyr)
#model
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
#dataframe having leaf node's rule and subrule combination
rule_df <- rpart.rules.table(fit) %>%
filter(Leaf==TRUE) %>%
group_by(Rule) %>%
summarise(Subrules = paste(Subrule, collapse=","))
#final dataframe
df <- kyphosis %>%
mutate(Rule = row.names(fit$frame)[fit$where]) %>%
left_join(rule_df, by="Rule")
head(df)
#subrule table
rpart.subrules.table(fit)
Output is:
Kyphosis Age Number Start Rule Subrules
1 absent 71 3 5 3 R1
2 absent 158 3 14 22 L1,R2,R3,L4
3 present 128 4 5 3 R1
4 absent 2 5 1 3 R1
5 absent 1 4 15 4 L1,L2
6 absent 1 2 16 4 L1,L2
Subrule definition:
Subrule Variable Value Less Greater
1 L1 Start 8.5 <NA> 8.5
2 L2 Start 14.5 <NA> 14.5
3 L3 Age <NA> 55 <NA>
4 L4 Age 111 <NA> 111
5 R1 Start <NA> 8.5 <NA>
6 R2 Start <NA> 14.5 <NA>
7 R3 Age 55 <NA> 55
8 R4 Age <NA> 111 <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With