Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rpart formula parameter usage: "NAs are not allowed in subscripted assignments"

Tags:

r

rpart

in simple linear regression, having a data frame, we can use that to write the formula easier, for example:

lm(my_dep_var ~ .-var1, data=my_df)

will return the model with the vars except var1 as indipendent variables in our model.
However, when I try to use the same formulation in rpart function, it seem to have an error:

> tree1 <- rpart(Reverse ~ .-Circuit, data=train[a], method="class", minbucket=25)
Error in rpart(Reverse ~ . - Circuit, data = train[a], method = "class",  : 
  NAs are not allowed in subscripted assignments
> 

I have no NA is my data, and the rpart command without the minus(but with the dot: Reverse ~ .) seem to work well. So it seems I can't use minus sign in rpart formula. Is that indeed the case ? where can I read this kind of thing in the documentation ?

EDIT: here is a simplified code that generates this kind of error:

var1 <- as.factor(c(1,1,1,0,1))
var2 <- c(0,0,0,0,0)
var3 <- factor(c("2", "9", "5", "5", "5"), levels=c("2","3","4","5","8","9"))
var4 <- factor(c("EA", "EA", "EA", "EA", "JP"), levels=c("EA", "CR", "CA", "JP"))
dtf <- data.frame(var1, var2, var3, var4)
rpart(var1 ~.-var4 ,data=dtf, method="class", minbucket=25)
like image 955
d_e Avatar asked Dec 09 '25 05:12

d_e


1 Answers

EDIT: new code.

I think the problem were the character/factor variables. To solve the problem, I had to create a data frame with dummy variables.

# Sample size
N <- 10000

# Creating the df
var1 <- sample(c(0,1),N,replace = T)
var2 <- sample(c(0),N,replace = T)
var3 <- as.factor(sample(c("2", "9", "5", "5", "5"),N,replace = T))
var4 <- as.factor(sample(c("EA", "EA", "EA", "EA", "JP"),N,replace = T))
dtf <- data.frame(var1, var2, var3, var4)

# Loading the package
require(rpart)
require(caret)

# Hot Enconding - Creating dummy variables
dummies <- dummyVars(~ ., data = dtf)
dtf2 <- as.data.frame(predict(dummies, newdata = dtf))

# Fitting the model
fit <- rpart(var1 ~. - var4.EA,data=dtf2, method="class", minbucket=25)

I hope that solves your problem.

like image 122
Diego Rodrigues Avatar answered Dec 10 '25 18:12

Diego Rodrigues



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!