I am trying to apply decision tree here. Decision tree takes care of splitting at each node itself. But at first node I want to split my tree on the basis of "Age". How do I force that.? <pre class="prettyprint"><code>library(party) fit2 <- ctree(Churn ~ Gender + Age + LastTransaction + Payment.Method + spend + marStat, data = tsdata) </code></pre>

There is no built-in option to do that in <code>ctree()</code>. The easiest method to do this "by hand" is simply: <ol> <li>Learn a tree with only <code>Age</code> as explanatory variable and <code>maxdepth = 1</code> so that this only creates a single split.</li> <li>Split your data using the tree from step 1 and create a subtree for the left branch.</li> <li>Split your data using the tree from step 1 and create a subtree for the right branch.</li> </ol> This does what you want (although I typically wouldn't recommend to do so...). If you use the <code>ctree()</code> implementation from <code>partykit</code> you can also merge these three trees into a single tree again for visualizations and predictions etc. It requires a bit of hacking but is still feasible. I will illustrate this using the <code>iris</code> data and I will force a split in the variable <code>Sepal.Length</code> which otherwise wouldn't be used in the tree. Learning the three trees above is easy: <pre class="prettyprint"><code>library("partykit") data("iris", package = "datasets") tr1 <- ctree(Species ~ Sepal.Length, data = iris, maxdepth = 1) tr2 <- ctree(Species ~ Sepal.Length + ., data = iris, subset = predict(tr1, type = "node") == 2) tr3 <- ctree(Species ~ Sepal.Length + ., data = iris, subset = predict(tr1, type = "node") == 3) </code></pre> Note, however, that it is important to use the formula with <code>Sepal.Length + .</code> to assure that the variables in the model frame are ordered in exactly the same way in all trees. Next comes the most technical step: We need do extract the raw <code>node</code> structure from all three trees, fix-up the node <code>id</code>s so that they are in a proper sequence and then integrate everything into a single node: <pre class="prettyprint"><code>fixids <- function(x, startid = 1L) { id <- startid - 1L new_node <- function(x) { id <<- id + 1L if(is.terminal(x)) return(partynode(id, info = info_node(x))) partynode(id, split = split_node(x), kids = lapply(kids_node(x), new_node), surrogates = surrogates_node(x), info = info_node(x)) } return(new_node(x)) } no <- node_party(tr1) no$kids <- list( fixids(node_party(tr2), startid = 2L), fixids(node_party(tr3), startid = 5L) ) no ## [1] root ## | [2] V2 <= 5.4 ## | | [3] V4 <= 1.9 * ## | | [4] V4 > 1.9 * ## | [5] V2 > 5.4 ## | | [6] V4 <= 4.7 ## | | | [7] V4 <= 3.6 * ## | | | [8] V4 > 3.6 * ## | | [9] V4 > 4.7 ## | | | [10] V5 <= 1.7 * ## | | | [11] V5 > 1.7 * </code></pre> And finally we set up a joint model frame containing all data and combine that with the new joint tree. Some information on fitted nodes and the response is added to be able to turn the tree into a <code>constparty</code> for nice visualization and predictions. See <code>vignette("partykit", package = "partykit")</code> for the background on this: <pre class="prettyprint"><code>d <- model.frame(Species ~ Sepal.Length + ., data = iris) tr <- party(no, data = d, fitted = data.frame( "(fitted)" = fitted_node(no, data = d), "(response)" = model.response(d), check.names = FALSE), terms = terms(d), ) tr <- as.constparty(tr) </code></pre> And then we're done and can visualize our combined tree with the forced first split: <pre class="prettyprint"><code>plot(tr) </code></pre> <img src="https://i.stack.imgur.com/kZ85E.png" alt="combined tree">

How to specify split in a decision tree in R programming?

Tags:

split

r

tree

machine-learning

party

I am trying to apply decision tree here. Decision tree takes care of splitting at each node itself. But at first node I want to split my tree on the basis of "Age". How do I force that.?

Click to copy

library(party)    
fit2 <- ctree(Churn ~ Gender + Age + LastTransaction + Payment.Method + spend + marStat, data = tsdata)

826

asked Oct 04 '16 05:10

Yogesh

2 Answers

There is no built-in option to do that in ctree(). The easiest method to do this "by hand" is simply:

Learn a tree with only Age as explanatory variable and maxdepth = 1 so that this only creates a single split.
Split your data using the tree from step 1 and create a subtree for the left branch.
Split your data using the tree from step 1 and create a subtree for the right branch.

This does what you want (although I typically wouldn't recommend to do so...).

If you use the ctree() implementation from partykit you can also merge these three trees into a single tree again for visualizations and predictions etc. It requires a bit of hacking but is still feasible.

I will illustrate this using the iris data and I will force a split in the variable Sepal.Length which otherwise wouldn't be used in the tree. Learning the three trees above is easy:

Click to copy

library("partykit")
data("iris", package = "datasets")
tr1 <- ctree(Species ~ Sepal.Length,     data = iris, maxdepth = 1)
tr2 <- ctree(Species ~ Sepal.Length + ., data = iris,
  subset = predict(tr1, type = "node") == 2)
tr3 <- ctree(Species ~ Sepal.Length + ., data = iris,
  subset = predict(tr1, type = "node") == 3)

Note, however, that it is important to use the formula with Sepal.Length + . to assure that the variables in the model frame are ordered in exactly the same way in all trees.

Next comes the most technical step: We need do extract the raw node structure from all three trees, fix-up the node ids so that they are in a proper sequence and then integrate everything into a single node:

Click to copy

fixids <- function(x, startid = 1L) {
  id <- startid - 1L
  new_node <- function(x) {
    id <<- id + 1L
    if(is.terminal(x)) return(partynode(id, info = info_node(x)))
    partynode(id,
      split = split_node(x),
      kids = lapply(kids_node(x), new_node),
      surrogates = surrogates_node(x),
      info = info_node(x))
  }

  return(new_node(x))   
}
no <- node_party(tr1)
no$kids <- list(
  fixids(node_party(tr2), startid = 2L),
  fixids(node_party(tr3), startid = 5L)
)
no
## [1] root
## |   [2] V2 <= 5.4
## |   |   [3] V4 <= 1.9 *
## |   |   [4] V4 > 1.9 *
## |   [5] V2 > 5.4
## |   |   [6] V4 <= 4.7
## |   |   |   [7] V4 <= 3.6 *
## |   |   |   [8] V4 > 3.6 *
## |   |   [9] V4 > 4.7
## |   |   |   [10] V5 <= 1.7 *
## |   |   |   [11] V5 > 1.7 *

And finally we set up a joint model frame containing all data and combine that with the new joint tree. Some information on fitted nodes and the response is added to be able to turn the tree into a constparty for nice visualization and predictions. See vignette("partykit", package = "partykit") for the background on this:

Click to copy

d <- model.frame(Species ~ Sepal.Length + ., data = iris)
tr <- party(no, 
  data = d,
  fitted = data.frame(
    "(fitted)" = fitted_node(no, data = d),
    "(response)" = model.response(d),
    check.names = FALSE),
  terms = terms(d),
)
tr <- as.constparty(tr)

And then we're done and can visualize our combined tree with the forced first split:

Click to copy

plot(tr)

combined tree

166

answered Sep 29 '22 22:09

Achim Zeileis

At every iteration, a decision tree will choose the best variable for splitting (either based on information gain / gini index, for CART, or based on chi-square test as for conditional inference tree). If you have better predictor variable that separates the classes more than that can be done by the predictor Age, then that variable will be chosen first.

I think based on your requirement, you can do the following couple of things:

(1) Unsupervised: Discretize the Age variable (create bins e.g., 0-20, 20-40, 40-60 etc., as per your domain knowledge) and subset the data for each of the age bins, then train a separate decision tree on each of these segments.

(2) Supervised: Keep on dropping the other predictor variables until Age is chosen first. Now, you will get a decision tree where Age is chosen as the first variable. Use the rules for Age (e.g., Age > 36 & Age <= 36) created by the decision tree to subset the data into 2 parts. On each of the parts learn a full decision tree with all the variables separately.

(3) Supervised Ensemble: you can use Randomforest classifier to see how important your Age variable is.

answered Sep 29 '22 22:09

Sandipan Dey

Related questions
                            
                                How to create a matrix of lists in R?
                            
                                Find all positions of all matches of one vector of values in second vector
                            
                                Adding ranked column to data frame
                            
                                Shiny observer respond to all checkboxes being unchecked
                            
                                Clustered standard errors in R using plm (with fixed effects)
                            
                                How to calculate a vector of midpoints or medians from a vector of cut points?
                            
                                Extract the last word between | |
                            
                                RStudio shiny runApp fails in working directory
                            
                                plotting shape file in ggplot2
                            
                                Copy rows inside data.table based on condition
                            
                                plotting the whole data within each facet using facet_wrap and ggplot2
                            
                                cbind a vector multiple times in R
                            
                                fread together with grepl
                            
                                R Metric RMSE not applicable for classification models
                            
                                How to treat NAs like values when comparing elementwise in R
                            
                                Replace NA with previous occurrence
                            
                                How to correlate and visualise correlation of one variable versus many
                            
                                R check if any missing arguments
                            
                                How can I replace one term in an R formula with two?
                            
                                rShiny textOutput and Paragraph on same line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to specify split in a decision tree in R programming?

Tags:

split

r

tree

machine-learning

party

Yogesh

People also ask

2 Answers

Achim Zeileis

Sandipan Dey

Recent Activity

Donate For Us