Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: nonstandard column names (white space, punctuation, starts with numbers)

Tags:

dataframe

r

dplyr

df <- structure(list(`a a` = 1:3, `a b` = 2:4), .Names = c("a a", "a b" ), row.names = c(NA, -3L), class = "data.frame") 

and the data looks like

  a a a b 1   1   2 2   2   3 3   3   4 

Following call to select

select(df, 'a a') 

gives

Error in abs(ind[ind < 0]) :    non-numeric argument to mathematical function 

How can I select "a a" and/or rename it to something without space using select? I know the following approaches:

  1. names(df)[1] <- "a"
  2. select(df, a=1)
  3. select(df, ends_with("a"))

but if I am working on a large data set, how can I get an exact match without knowing the index numer or similar column names?

like image 651
Flux Avatar asked Apr 03 '14 15:04

Flux


2 Answers

You may select the variable by using backticks `.

select(df, `a a`) #   a a # 1   1 # 2   2 # 3   3 

However, if your main objective is to rename the column, you may use rename in plyr package, in which you can use both "" and ``.

rename(df, replace = c("a a" = "a")) rename(df, replace = c(`a a` = "a")) 

Or in base R:

names(df)[names(df) == "a a"] <- "a" 

For a more thorough description on the use of various quotes, see ?Quotes. The 'Names and Identifiers' section is especially relevant here:

other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

See also ?make.names about valid names.

See also this post about renaming in dplyr

like image 110
Henrik Avatar answered Sep 22 '22 23:09

Henrik


Some alternatives to backticks, good as of dplyr 0.5.0, the current version as of this writing.

If you're trying to programmatically select an argument as a column and you don't want to rename or do something like paste/sprintf the column name into backticks, you can use as.name in conjunction with the non-standard evaluation version of select, which is select_:

dplyr::select_(df, as.name("a a")) 

Many of the dplyr functions have non-standard versions. In the case of select specifically, you can also use the standard version in conjunction with the select helper one_of. See ?dplyr::select_helpers for documentation:

dplyr::select(df, dplyr::one_of("a a")) 
like image 24
Andy Avatar answered Sep 21 '22 23:09

Andy