I'm in a situation where I have a vector full of column names for a really large data frame.
Let's assume: x = c("Name", "address", "Gender", ......, "class" )
[approximatively 100 variables]
Now, I would like to create a formula which I'll eventually use to create a HoeffdingTree
.
I'm creating formula using:
myformula <- as.formula(paste("class ~ ", paste(x, collapse= "+")))
This throws up the following error:
Error in parse(text = x) : :1:360: unexpected 'else' 1:e+spread+prayforsonni+just+want+amp+argue+blxcknicotine+mood+now+right+actually+herapatra+must+simply+suck+there+always+cookies+ever+everything+getting+nice+nigga+they+times+abu+all+alliepickl
The paste
part in the above statement works fine but passing it as an argument to as.formula
is throwing all kinds of weird problems.
The problem is that you have R keywords as column names. else
is a keyword so you can't use it as a regular name.
A simplified example:
s <- c("x", "else", "z")
f <- paste("y~", paste(s, collapse="+"))
formula(f)
# Error in parse(text = x) : <text>:1:10: unexpected '+'
# 1: y~ x+else+
# ^
The solution is to wrap your words in backticks "`" so that R will treat them as non-syntactic variable names.
f <- paste("y~", paste(sprintf("`%s`", s), collapse="+"))
formula(f)
# y ~ x + `else` + z
You can reduce your data-set first
dat_small <- dat[,c("class",x)]
and then use
myformula <- as.formula("class ~ .")
The .
means using all other (all but class) column.
You may try reformulate
reformulate(setdiff(x, 'class'), response='class')
#class ~ Name + address + Gender
where 'x' is
x <- c("Name", "address", "Gender", 'class')
If R keywords are in the 'x', you can do
reformulate('.', response='class')
#class ~ .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With