I know it is preferred if variable names do not have spaces in them. I have a situation where I need publication-quality charts, so axes and legends need to have properly formatted labels, ie with spaces. So, for example, in development I might have variables called "Pct.On.OAC" and Age.Group, but in my final plot I need "% on OAC" and "Age Group" to appear:
'data.frame': 22 obs. of 3 variables: $ % on OAC : Factor w/ 11 levels "0","0.1-9.9",..: 1 2 3 4 5 6 7 8 9 10 ... $ Age Group : Factor w/ 2 levels "Aged 80 and over",..: 1 1 1 1 1 1 1 1 1 1 ... $ Number of Practices: int 47 5 33 98 287 543 516 222 67 14 ...
But when I try to plot these:
ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) + geom_bar() )
no problem with that. But when I add a facet:
ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) + geom_bar() + facet_grid(`Age Group`~ .)
I get Error in
[.data.frame(base, names(rows)) : undefined columns selected
If I change Age Group
to Age.Group
then it works fine, but as I said, I don't want the dot to appear in the title legend.
So my questions are:
To select a column name with spaces, use the back tick symbol with column name. The symbol is ( ` `). Back tick is displayed in the keyboard below the tilde operator ( ~).
Column names can contain any valid characters (for example, spaces).
The easiest option to replace spaces in column names is with the clean. names() function. This R function creates syntactically correct column names by replacing blanks with an underscore. Moreover, you can use this function in combination with the %>%-operator from the Tidyverse package.
You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword. Basically when it is not valid Python identifier.
You asked "Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names" and yes there are a few:
make.names()
function to create safe names; this is used by R too to create identifiers (eg by using underscores for spaces etc)Example for the last two points:
R> myvec <- list("foo"=3.14, "some bar"=2.22) R> myvec$'some bar' * 2 [1] 4.44 R> make.names(names(myvec)) [1] "foo" "some.bar" R>
This is a "bug" in the package ggplot2
that comes from the fact that the function as.data.frame()
in the internal ggplot2 function quoted_df
converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.
To remind you :
syntactically valid names consists of letters, numbers and the dot or underline characters, and start with a letter or the dot (but the dot cannot be followed by a number)
There's a reason for that. There's also a reason why ggplot allows you to set labels using labs
, eg using the following dummy dataset with valid names:
X <-data.frame( PonOAC = rep(c('a','b','c','d'),2), AgeGroup = rep(c("over 80",'under 80'),each=4), NumberofPractices = rpois(8,70) )
You can use labs at the end to make this code work
ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) + geom_bar() + facet_grid(AgeGroup~ .) + labs(x="% on OAC", y="Number of Practices",fill = "Age Group")
To produce
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With