I know it is preferred if variable names do not have spaces in them. I have a situation where I need publication-quality charts, so axes and legends need to have properly formatted labels, ie with spaces. So, for example, in development I might have variables called "Pct.On.OAC" and Age.Group, but in my final plot I need "% on OAC" and "Age Group" to appear: <pre class="prettyprint"><code>'data.frame': 22 obs. of 3 variables: $ % on OAC : Factor w/ 11 levels "0","0.1-9.9",..: 1 2 3 4 5 6 7 8 9 10 ... $ Age Group : Factor w/ 2 levels "Aged 80 and over",..: 1 1 1 1 1 1 1 1 1 1 ... $ Number of Practices: int 47 5 33 98 287 543 516 222 67 14 ... </code></pre> But when I try to plot these: <pre class="prettyprint"><code>ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) + geom_bar() ) </code></pre> no problem with that. But when I add a facet: <pre class="prettyprint"><code>ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) + geom_bar() + facet_grid(`Age Group`~ .) </code></pre> I get <code>Error in</code>[.data.frame<code>(base, names(rows)) : undefined columns selected</code> If I change <code>Age Group</code> to <code>Age.Group</code> then it works fine, but as I said, I don't want the dot to appear in the title legend. So my questions are: <ol> <li>Is there a workaround for the problem with the facet ?</li> <li>Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names when I want the final plot to include them ? I suppose I can manually overide them, but that seems like a lot of faffing around.</li> </ol>

You asked "Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names" and yes there are a few: <ul> <li>Just don't use them as things will break as you experienced here</li> <li>Use the <code>make.names()</code> function to create safe names; this is used by R too to create identifiers (eg by using underscores for spaces etc)</li> <li>If you must, protect the unsafe identifiers with backticks.</li> </ul> Example for the last two points: <pre class="prettyprint"><code>R> myvec <- list("foo"=3.14, "some bar"=2.22) R> myvec$'some bar' * 2 [1] 4.44 R> make.names(names(myvec)) [1] "foo" "some.bar" R> </code></pre>

How to deal with spaces in column names?

Tags:

r

ggplot2

I know it is preferred if variable names do not have spaces in them. I have a situation where I need publication-quality charts, so axes and legends need to have properly formatted labels, ie with spaces. So, for example, in development I might have variables called "Pct.On.OAC" and Age.Group, but in my final plot I need "% on OAC" and "Age Group" to appear:

'data.frame':   22 obs. of  3 variables:  $ % on OAC           : Factor w/ 11 levels "0","0.1-9.9",..: 1 2 3 4 5 6 7 8 9 10 ...  $ Age Group          : Factor w/ 2 levels "Aged 80 and over",..: 1 1 1 1 1 1 1 1 1 1 ...  $ Number of Practices: int  47 5 33 98 287 543 516 222 67 14 ...

But when I try to plot these:

ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +     geom_bar() )

no problem with that. But when I add a facet:

ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +     geom_bar() +     facet_grid(`Age Group`~ .)

I get Error in[.data.frame(base, names(rows)) : undefined columns selected

If I change Age Group to Age.Group then it works fine, but as I said, I don't want the dot to appear in the title legend.

So my questions are:

Is there a workaround for the problem with the facet ?
Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names when I want the final plot to include them ? I suppose I can manually overide them, but that seems like a lot of faffing around.

879

asked Oct 05 '12 10:10

Robert Long

2 Answers

You asked "Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names" and yes there are a few:

Just don't use them as things will break as you experienced here
Use the make.names() function to create safe names; this is used by R too to create identifiers (eg by using underscores for spaces etc)
If you must, protect the unsafe identifiers with backticks.

Example for the last two points:

R> myvec <- list("foo"=3.14, "some bar"=2.22) R> myvec$'some bar' * 2 [1] 4.44 R> make.names(names(myvec)) [1] "foo"      "some.bar" R>

115

answered Sep 19 '22 12:09

Dirk Eddelbuettel

This is a "bug" in the package ggplot2 that comes from the fact that the function as.data.frame() in the internal ggplot2 function quoted_df converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.

To remind you :

syntactically valid names consists of letters, numbers and the dot or underline characters, and start with a letter or the dot (but the dot cannot be followed by a number)

There's a reason for that. There's also a reason why ggplot allows you to set labels using labs, eg using the following dummy dataset with valid names:

X <-data.frame(   PonOAC = rep(c('a','b','c','d'),2),   AgeGroup = rep(c("over 80",'under 80'),each=4),   NumberofPractices = rpois(8,70)   )

You can use labs at the end to make this code work

ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +   geom_bar() +   facet_grid(AgeGroup~ .) +    labs(x="% on OAC", y="Number of Practices",fill = "Age Group")

To produce

enter image description here

answered Sep 20 '22 12:09

Joris Meys

Related questions
                            
                                R-Project no applicable method for 'meta' applied to an object of class "character"
                            
                                Safely creating S3 Generics in R
                            
                                Cluster one-dimensional data optimally? [closed]
                            
                                R - plot human body in 2d [closed]
                            
                                Cartogram + choropleth map in R
                            
                                What are productive ways to debug Rcpp compiled code loaded in R (on OS X Mavericks)?
                            
                                Using lapply to apply a function over list of data frames and saving output to files with different names
                            
                                How do you handle R Data internal to a package?
                            
                                Command to see 'R' path that RStudio is using
                            
                                What's the differences between & and &&, | and || in R? [duplicate]
                            
                                Preventing performance regressions in R
                            
                                Why is enquo + !! preferable to substitute + eval
                            
                                Automatically adjust LaTeX table width to fit pdf using knitr and Rstudio
                            
                                Finding the index inside a vector satisfying a condition
                            
                                R: apt-get install r-cran-foo vs. install.packages("foo")
                            
                                find *all* duplicated records in data.table (not all-but-one)
                            
                                Why do logicals (booleans) in R require 4 bytes?
                            
                                Max Length for a Vector in R
                            
                                Element-wise mean in R
                            
                                How to align multiple ggplot2 plots and add shadows over all of them

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With