in simple linear regression, having a data frame, we can use that to write the formula easier, for example:
lm(my_dep_var ~ .-var1, data=my_df)
will return the model with the vars except var1 as indipendent variables in our model.
However, when I try to use the same formulation in rpart function, it seem to have an error:
> tree1 <- rpart(Reverse ~ .-Circuit, data=train[a], method="class", minbucket=25)
Error in rpart(Reverse ~ . - Circuit, data = train[a], method = "class", :
NAs are not allowed in subscripted assignments
>
I have no NA is my data, and the rpart command without the minus(but with the dot: Reverse ~ .) seem to work well. So it seems I can't use minus sign in rpart formula.
Is that indeed the case ? where can I read this kind of thing in the documentation ?
EDIT: here is a simplified code that generates this kind of error:
var1 <- as.factor(c(1,1,1,0,1))
var2 <- c(0,0,0,0,0)
var3 <- factor(c("2", "9", "5", "5", "5"), levels=c("2","3","4","5","8","9"))
var4 <- factor(c("EA", "EA", "EA", "EA", "JP"), levels=c("EA", "CR", "CA", "JP"))
dtf <- data.frame(var1, var2, var3, var4)
rpart(var1 ~.-var4 ,data=dtf, method="class", minbucket=25)
EDIT: new code.
I think the problem were the character/factor variables. To solve the problem, I had to create a data frame with dummy variables.
# Sample size
N <- 10000
# Creating the df
var1 <- sample(c(0,1),N,replace = T)
var2 <- sample(c(0),N,replace = T)
var3 <- as.factor(sample(c("2", "9", "5", "5", "5"),N,replace = T))
var4 <- as.factor(sample(c("EA", "EA", "EA", "EA", "JP"),N,replace = T))
dtf <- data.frame(var1, var2, var3, var4)
# Loading the package
require(rpart)
require(caret)
# Hot Enconding - Creating dummy variables
dummies <- dummyVars(~ ., data = dtf)
dtf2 <- as.data.frame(predict(dummies, newdata = dtf))
# Fitting the model
fit <- rpart(var1 ~. - var4.EA,data=dtf2, method="class", minbucket=25)
I hope that solves your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With