Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating formula using very long strings in R

Tags:

r

formula

I'm in a situation where I have a vector full of column names for a really large data frame.

Let's assume: x = c("Name", "address", "Gender", ......, "class" ) [approximatively 100 variables]

Now, I would like to create a formula which I'll eventually use to create a HoeffdingTree. I'm creating formula using:

myformula <- as.formula(paste("class ~ ", paste(x, collapse= "+")))

This throws up the following error:

Error in parse(text = x) : :1:360: unexpected 'else' 1:e+spread+prayforsonni+just+want+amp+argue+blxcknicotine+mood+now+right+actually+herapatra+must+simply+suck+there+always+cookies+ever+everything+getting+nice+nigga+they+times+abu+all+alliepickl

The paste part in the above statement works fine but passing it as an argument to as.formula is throwing all kinds of weird problems.

like image 863
Jayaprakash Mara Avatar asked Apr 10 '15 07:04

Jayaprakash Mara


3 Answers

The problem is that you have R keywords as column names. else is a keyword so you can't use it as a regular name.

A simplified example:

s <- c("x", "else", "z")
f <- paste("y~", paste(s, collapse="+"))
formula(f)
# Error in parse(text = x) : <text>:1:10: unexpected '+'
# 1: y~ x+else+
#              ^

The solution is to wrap your words in backticks "`" so that R will treat them as non-syntactic variable names.

f <- paste("y~", paste(sprintf("`%s`", s), collapse="+"))
formula(f)
# y ~ x + `else` + z
like image 99
Hong Ooi Avatar answered Nov 15 '22 05:11

Hong Ooi


You can reduce your data-set first

dat_small <- dat[,c("class",x)]

and then use

myformula <- as.formula("class ~ .")

The . means using all other (all but class) column.

like image 40
Rentrop Avatar answered Nov 15 '22 06:11

Rentrop


You may try reformulate

 reformulate(setdiff(x, 'class'), response='class')
 #class ~ Name + address + Gender

where 'x' is

  x <- c("Name", "address", "Gender", 'class')

If R keywords are in the 'x', you can do

   reformulate('.', response='class')
   #class ~ .
like image 36
akrun Avatar answered Nov 15 '22 04:11

akrun