Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting invalid model formula in ExtractVars when using rpart function in R

Tags:

r

rpart

The dataset can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

Getting the following error:

formula(formula, data = data) : 
  invalid model formula in ExtractVars

Using the following code:

install.packages("rpart")
library("rpart")

# you'll need to change the following from windows to work on a linux box:
mydata <- read.csv(file="c:/Users/md7968/downloads/winequality-red.csv")

# grow tree 
fit <- rpart(YouSweetBoy ~ "residual sugar" + "citric acid", method = "class", data = mydata

Mind you I've changed the delimiters in the CSV file to commas.

perhaps it's not reading the data correctly. Forgive me, I'm new to R and not a very good programmer.

like image 771
dgene54 Avatar asked Jan 14 '15 20:01

dgene54


2 Answers

Look at names(mydata). When you create a data.frame, read.table() will turn "bad" column names into good ones. You can't (well, shouldn't) have a space in a column name so R changes spaces to periods. Plus, you should never have quoted strings in a formula. Try

fit <- rpart(quality ~ residual.sugar + citric.acid, method = "class", data = mydata)

(I have no idea what "YouSweetBoy" was supposed to be since that wasn't in the dataset so i changed it to "quality").

like image 82
MrFlick Avatar answered Oct 31 '22 12:10

MrFlick


Removing the space in independent variable names and taking off the quotes made it to work.

Instead of "residual sugar", use residual_sugar

like image 39
Srinivasan S Avatar answered Oct 31 '22 12:10

Srinivasan S