Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ggplot2 with columns that have spaces in their names

Tags:

r

ggplot2

I've the following data frame structure

df <- as.data.frame(A)
colnames(df)<- c("Sum of MAE", "Company")
df <- na.omit(df)
df2 <- df[order(df[,1]),]
df2 <- head(df2, n=10)
ggplot(df2, aes_string("Sum of MAE", "Company", group=1) + geom_line())
print(df2)

This is the structure of the data

 Sum of MAE Company
606   0.030156758080105    COCO
182  0.0600065426668421    APWC
836  0.0602272459239397     EDS
1043 0.0704327240953608    FREE
2722               0.09   VLYWW
1334 0.0900000000000001    IKAN
2420  0.104746328560384     SPU
860   0.106063964745531    ELON
2838  0.108373386847075    WTSL
1721  0.110086738825851    MTSL

The ggplot doesnt seem to be working. After a litany of errors the current one I'm getting is

Error in parse(text = x) : <text>:1:5: unexpected symbol
1: Sum of

Can someone help me getting the ggplot 2 working.

like image 554
Jumper Avatar asked Mar 18 '15 22:03

Jumper


People also ask

How do you handle space in column names in R?

Names with spaces can be specified using backticks. So it'll look something like this: DeliveryPlot <- ggplot(data = OnTimeDelivery, aes(x = `Number of deliveries`, y = `Number On Time`, fill = Percent)) + ...

Can you have spaces in column names in R?

A basic rule of R is to avoid naming data-frame columns using names that contain spaces. R will accept a name containing spaces, but the spaces then make it impossible to reference the object in a function.

Can you have spaces in variable names in R?

R variables cannot have spaces.

How do I remove spaces from a column in R?

The stringr package has a function called str_squish() that will both trim the leading spaces and remove duplicate spaces within the string.


1 Answers

This is a good reason you should always make sure you have valid column names. First, here's an easier-to-reproduce version of your dataset

df2 <- data.frame(`Sum of MAE` = c(0.030156758080105, 0.0600065426668421, 
   0.0602272459239397, 0.0704327240953608, 0.09, 0.0900000000000001, 
   0.104746328560384, 0.106063964745531, 0.108373386847075, 0.110086738825851
   ), Company = c("COCO", "APWC", "EDS", "FREE", "VLYWW", "IKAN", "SPU", "ELON", 
   "WTSL", "MTSL"), check.names=F)

ggplot(df2, aes_string("Sum of MAE", "Company", group=1) + geom_line())
# Error in parse(text = x) : <text>:1:5: unexpected symbol
# 1: Sum of
#         ^

The problem is that aes_string() uses parse() to turn your text expression into a proper R symbol that can be resolved within the data.frame. When you parse "Sum of MAE" that's not valid R syntax -- that is, it doesn't resolve to a single nice symbol name. If you use "bad" names like that, you can escape them with the back-tick to treat the expression (spaces and all) as a symbol. So you can do

ggplot(df2, aes_string("`Sum of MAE`", "Company", group=1)) + geom_line()
# or
ggplot(df2, aes(`Sum of MAE`, Company, group=1)) + geom_line()

but really it would be better to stick to using valid column names for your data.frame rather than bypassing the checks with colnames().

If you were changing the column names to get "nicer" axis labels, you should probably do what with xlab() instead. For example

df3 <- data.frame(df2)
names(df3)
# [1] "Sum.of.MAE" "Company" 
ggplot(df3, aes(Sum.of.MAE, Company, group=1)) + 
    geom_line() + 
    xlab("Sum of MAE values")
like image 175
MrFlick Avatar answered Sep 29 '22 20:09

MrFlick