Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using columns with special characters in formulae in R

I'm trying to make a decision tree using rpart using a data frame that has ~200 columns. Some of these columns have numbers in their names, some have special characters (e.g. "/"). When I try to generate the tree I get error such as the ones below:

R> gg.rpart <- rpart(nospecialchar ~ Special/char, data=temp, method="class")
Error in eval(expr, envir, enclos) : object 'Special' not found
R> gg.rpart <- rpart(nospecialchar ~ "Special/char", data=temp, method="class")
Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars
R> gg.rpart <- rpart(nospecialchar ~ `Special/char`, data=temp, method="class")
Error in `[.data.frame`(frame, predictors) : undefined columns selected

Do I have to change the names to accommodate R or is there some way to pass column names with special characters to R formulae?

like image 355
Conor Avatar asked Feb 14 '12 06:02

Conor


People also ask

How do I remove special characters from a column in R?

Use gsub() function to remove a character from a string or text in R.

How do I select a character column in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

How do I add a character to a column in R?

To add a new column to a dataframe in R you can use the $-operator. For example, to add the column “NewColumn”, you can do like this: dataf$NewColumn <- Values . Now, this will effectively add your new variable to your dataset.

How do I refer to a column in a dataset in R?

Data in data frames can be addressed by index (subsetting), by logical vector, or by name (columns only). Use the $ operator to address a column by name.


2 Answers

This works:

dat <- data.frame(M=rnorm(10),'A/B'=1:10,check.names=F)

> lm(M~`A/B`,dat)

Call:
lm(formula = M ~ `A/B`, data = dat)

Coefficients:
(Intercept)        `A/B`  
    -1.0494       0.1214  
like image 150
Wojciech Sobala Avatar answered Oct 01 '22 19:10

Wojciech Sobala


Joran's comment on my question is the answer - I didn't know of the existence of make.names()

Joran, if you reply as an answer I'll mark you as correct. Cheers!

like image 38
Conor Avatar answered Oct 01 '22 20:10

Conor