Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying column names in a data.frame changes spaces to "."

Let's say I have a data.frame, like so:

x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10) df <- data.frame("Label 1"=x,"Label 2"=rnorm(100)) 

head(df,3)

returns:

  Label.1    Label.2 1       1  1.9825458 2       2 -0.4515584 3       3  0.6397516 

How do I get R to stop automagically replacing the space with a period in the column name? ie, "Label 1" instead of "Label.1".

like image 641
Brandon Bertelsen Avatar asked Aug 05 '10 01:08

Brandon Bertelsen


2 Answers

You may set check.names = FALSE in data.frame (as well as in read.table):

df <- data.frame("Label 1" = 1:3, "Label 2" = rnorm(3), check.names = FALSE) 

returns:

  Label 1    Label 2 1       1  0.2013347 2       2  1.8823111 3       3 -0.5233811 

From ?data.frame:

check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.


From ?make.names:

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

All invalid characters are translated to "."


Also, if you need to subset a variable with an 'invalid' name using $, you can use backticks `. For example:

df$`Label 1` 
like image 118
Brandon Bertelsen Avatar answered Sep 28 '22 07:09

Brandon Bertelsen


You don't.

With the space you desire the format would not satisfy the requirements for an identifier that come to play when you use df$column.1 -- that could not cope with a space. So see the make.names() function for details or an example:

> make.names(c("Foo Bar", "tic tac")) [1] "Foo.Bar" "tic.tac"   >                                               

Edit eleven years later: The answer still stands that R prefers column names can be valid variable names. But R is flexible: if you insist you can use the other form _but then need to require the not-otherwise-valid-within-the-language column names explicitly:

> x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10) > df <- data.frame("Label 1"=x,"Label 2"=rnorm(100), check.names=FALSE) > summary( df$`Label 2` )    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.  -2.2719 -0.7148 -0.0971 -0.0275  0.6559  2.5820  >  

So by saying check.names=FALSE we override the default (and sensible) check, and by wrapping the identifier in backticks we can access the column.

like image 37
Dirk Eddelbuettel Avatar answered Sep 28 '22 09:09

Dirk Eddelbuettel