Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with spaces in column names?

Tags:

r

ggplot2

I know it is preferred if variable names do not have spaces in them. I have a situation where I need publication-quality charts, so axes and legends need to have properly formatted labels, ie with spaces. So, for example, in development I might have variables called "Pct.On.OAC" and Age.Group, but in my final plot I need "% on OAC" and "Age Group" to appear:

'data.frame':   22 obs. of  3 variables:  $ % on OAC           : Factor w/ 11 levels "0","0.1-9.9",..: 1 2 3 4 5 6 7 8 9 10 ...  $ Age Group          : Factor w/ 2 levels "Aged 80 and over",..: 1 1 1 1 1 1 1 1 1 1 ...  $ Number of Practices: int  47 5 33 98 287 543 516 222 67 14 ... 

But when I try to plot these:

ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +     geom_bar() ) 

no problem with that. But when I add a facet:

ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +     geom_bar() +     facet_grid(`Age Group`~ .)  

I get Error in[.data.frame(base, names(rows)) : undefined columns selected

If I change Age Group to Age.Group then it works fine, but as I said, I don't want the dot to appear in the title legend.

So my questions are:

  1. Is there a workaround for the problem with the facet ?
  2. Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names when I want the final plot to include them ? I suppose I can manually overide them, but that seems like a lot of faffing around.
like image 879
Robert Long Avatar asked Oct 05 '12 10:10

Robert Long


People also ask

How do you handle column names with space?

To select a column name with spaces, use the back tick symbol with column name. The symbol is ( ` `). Back tick is displayed in the keyboard below the tilde operator ( ~).

Can column names have spaces?

Column names can contain any valid characters (for example, spaces).

How do you handle spaces in column names in R?

The easiest option to replace spaces in column names is with the clean. names() function. This R function creates syntactically correct column names by replacing blanks with an underscore. Moreover, you can use this function in combination with the %>%-operator from the Tidyverse package.

How do you handle column names with spaces in Python?

You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword. Basically when it is not valid Python identifier.


2 Answers

You asked "Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names" and yes there are a few:

  • Just don't use them as things will break as you experienced here
  • Use the make.names() function to create safe names; this is used by R too to create identifiers (eg by using underscores for spaces etc)
  • If you must, protect the unsafe identifiers with backticks.

Example for the last two points:

R> myvec <- list("foo"=3.14, "some bar"=2.22) R> myvec$'some bar' * 2 [1] 4.44 R> make.names(names(myvec)) [1] "foo"      "some.bar" R>  
like image 115
Dirk Eddelbuettel Avatar answered Sep 19 '22 12:09

Dirk Eddelbuettel


This is a "bug" in the package ggplot2 that comes from the fact that the function as.data.frame() in the internal ggplot2 function quoted_df converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.

To remind you :

syntactically valid names consists of letters, numbers and the dot or underline characters, and start with a letter or the dot (but the dot cannot be followed by a number)

There's a reason for that. There's also a reason why ggplot allows you to set labels using labs, eg using the following dummy dataset with valid names:

X <-data.frame(   PonOAC = rep(c('a','b','c','d'),2),   AgeGroup = rep(c("over 80",'under 80'),each=4),   NumberofPractices = rpois(8,70)   )  

You can use labs at the end to make this code work

ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +   geom_bar() +   facet_grid(AgeGroup~ .) +    labs(x="% on OAC", y="Number of Practices",fill = "Age Group") 

To produce

enter image description here

like image 35
Joris Meys Avatar answered Sep 20 '22 12:09

Joris Meys