Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr's rename() including variable names not in data set

Tags:

r

dplyr

plyr

I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10])

sample2 <- data.frame(B=11:20, C=letters[11:20])

And then,

 rename(sample1, var1 = A, var2 = B, var3 = C)

I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get

Error: Unknown variables: C.

In contrast, the plyr syntax would let me use

rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))
rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3"))

and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

like image 854
AmeliaMN Avatar asked Feb 25 '15 01:02

AmeliaMN


People also ask

How do I rename a variable in R?

I'll just say it once more: if you need to rename variables in R, just use the rename() function.

How do I rename a variable in SAS dataset?

There may be occasions in which you want to change some of the variable names in your SAS data set. To do so, you'll want to use the RENAME= option. As its name suggests, the RENAME= option allows you to change the variable names within a SAS data set. RENAME = (old1=new1 old2=new2 ....

How do I rename a variable in a Dataframe in R?

Method 1: using colnames() method colnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.


4 Answers

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table:

sample1 <- data.frame(A=1:10, B=letters[1:10])
sample2 <- data.frame(B=11:20, C=letters[11:20])

rename_map <- c("A"="var1",
                "B"="var2",
                "C"="var3")

names(sample1) <- rename_map[names(sample1)]
str(sample1)

names(sample2) <- rename_map[names(sample2)]
str(sample2)

Fundamentally the algorithm is simple:

  1. Build a lookup table of current variable names to desired names
  2. Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.

EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

like image 79
earino Avatar answered Jan 02 '23 07:01

earino


    #no need to use rename 

    oldnames<-unique(c(names(sample1),names(sample2)))
    newnames<-c("var1","var2","var3")
    name_df<-data.frame(oldnames,newnames)
    mydata<-list(sample1,sample2) # combined two datasets as a list
#one liner
    finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in%  colnames(i),2]
return(i)})
> finaldata
[[1]]
   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

[[2]]
   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
like image 34
Metrics Avatar answered Jan 02 '23 07:01

Metrics


With dplyr, we can use a named vector with old names as values and new names as names, then unquote only the values in name_vec that matches names in your dataset. rename supports unquoting characters, so there is no need to convert them to sym beforehand:

library(dplyr)

name_vec <- c(var1 = "A", var2 = "B", var3 = "C")

sample1 %>%
  rename(!!name_vec[name_vec %in% names(.)])

sample2 %>%
  rename(!!name_vec[name_vec %in% names(.)])

Also, with setNames:

name_vec <- c(A = "var1", B = "var2", C = "var3")

sample1 %>%
  setNames(name_vec[names(.)])

sample2 %>%
  setNames(name_vec[names(.)])

Output:

   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
like image 36
acylam Avatar answered Jan 02 '23 08:01

acylam


I’ve used @earino’s answer before myself, but discovered that it can be unsafe. If column names of the data frame are missing in the (names of the) named vector, those column names are silently replaced with NA and that is certainly not what you want.

d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

rename_vec <- c("B" = "var2", "C" = "var3")

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

The same can happen, if you run names(d1) <- rename_vec[names(d1)] twice by accident, because when you run it the second time, none of the colnames(d1) are in names(rename_vec).

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA: int  1 2 3 4 5 6 7 8 9 10
#>  $ NA: chr  "a" "b" "c" "d" ...

We just need to select those columns that are in the data frame and in the rename vector.

d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

sel <- is.element(colnames(d2), names(rename_vec))
names(d2)[sel] <- rename_vec[names(d2)][sel]
str(d2)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ B1  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

UPDATE: I initially had a solution here that involved string replacement, which turned out to be unsafe as well, because it allowed for partial matching. This one is better, I think.

like image 22
dpprdan Avatar answered Jan 02 '23 08:01

dpprdan