I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example, <pre class="prettyprint"><code>sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) </code></pre> And then, <pre class="prettyprint"><code> rename(sample1, var1 = A, var2 = B, var3 = C) </code></pre> I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get Error: Unknown variables: C. In contrast, the plyr syntax would let me use <pre class="prettyprint"><code>rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3")) rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3")) </code></pre> and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table: <pre class="prettyprint"><code>sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) rename_map <- c("A"="var1", "B"="var2", "C"="var3") names(sample1) <- rename_map[names(sample1)] str(sample1) names(sample2) <- rename_map[names(sample2)] str(sample2) </code></pre> Fundamentally the algorithm is simple: <ol> <li>Build a lookup table of current variable names to desired names</li> <li>Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.</li> </ol> EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

<pre class="prettyprint"><code> #no need to use rename oldnames<-unique(c(names(sample1),names(sample2))) newnames<-c("var1","var2","var3") name_df<-data.frame(oldnames,newnames) mydata<-list(sample1,sample2) # combined two datasets as a list #one liner finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in% colnames(i),2] return(i)}) > finaldata [[1]] var1 var2 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j [[2]] var2 var3 1 11 k 2 12 l 3 13 m 4 14 n 5 15 o 6 16 p 7 17 q 8 18 r 9 19 s 10 20 t </code></pre>

With <code>dplyr</code>, we can use a named vector with old names as values and new names as names, then unquote only the values in <code>name_vec</code> that matches names in your dataset. <code>rename</code> supports unquoting characters, so there is no need to convert them to <code>sym</code> beforehand: <pre class="prettyprint"><code>library(dplyr) name_vec <- c(var1 = "A", var2 = "B", var3 = "C") sample1 %>% rename(!!name_vec[name_vec %in% names(.)]) sample2 %>% rename(!!name_vec[name_vec %in% names(.)]) </code></pre> Also, with <code>setNames</code>: <pre class="prettyprint"><code>name_vec <- c(A = "var1", B = "var2", C = "var3") sample1 %>% setNames(name_vec[names(.)]) sample2 %>% setNames(name_vec[names(.)]) </code></pre> Output: <pre class="prettyprint"><code> var1 var2 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j var2 var3 1 11 k 2 12 l 3 13 m 4 14 n 5 15 o 6 16 p 7 17 q 8 18 r 9 19 s 10 20 t </code></pre>

I’ve used @earino’s answer before myself, but discovered that it can be unsafe. If column names of the data frame are missing in the (names of the) named vector, those column names are silently replaced with <code>NA</code> and that is certainly not what you want. <pre class="prettyprint lang-r prettyprint-override"><code>d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE) rename_vec <- c("B" = "var2", "C" = "var3") names(d1) <- rename_vec[names(d1)] str(d1) #> 'data.frame': 10 obs. of 2 variables: #> $ NA : int 1 2 3 4 5 6 7 8 9 10 #> $ var2: chr "a" "b" "c" "d" ... </code></pre> The same can happen, if you run <code>names(d1) <- rename_vec[names(d1)]</code> twice by accident, because when you run it the second time, none of the <code>colnames(d1)</code> are in <code>names(rename_vec)</code>. <pre class="prettyprint lang-r prettyprint-override"><code>names(d1) <- rename_vec[names(d1)] str(d1) #> 'data.frame': 10 obs. of 2 variables: #> $ NA: int 1 2 3 4 5 6 7 8 9 10 #> $ NA: chr "a" "b" "c" "d" ... </code></pre> We just need to select those columns that are in the data frame and in the rename vector. <pre class="prettyprint lang-r prettyprint-override"><code>d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE) sel <- is.element(colnames(d2), names(rename_vec)) names(d2)[sel] <- rename_vec[names(d2)][sel] str(d2) #> 'data.frame': 10 obs. of 2 variables: #> $ B1 : int 1 2 3 4 5 6 7 8 9 10 #> $ var2: chr "a" "b" "c" "d" ... </code></pre> UPDATE: I initially had a solution here that involved string replacement, which turned out to be unsafe as well, because it allowed for partial matching. This one is better, I think.

Using dplyr's rename() including variable names not in data set

Tags:

r

dplyr

plyr

I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10])

sample2 <- data.frame(B=11:20, C=letters[11:20])

And then,

 rename(sample1, var1 = A, var2 = B, var3 = C)

I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get

Error: Unknown variables: C.

In contrast, the plyr syntax would let me use

rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))
rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3"))

and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

854

asked Feb 25 '15 01:02

AmeliaMN

4 Answers

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table:

sample1 <- data.frame(A=1:10, B=letters[1:10])
sample2 <- data.frame(B=11:20, C=letters[11:20])

rename_map <- c("A"="var1",
                "B"="var2",
                "C"="var3")

names(sample1) <- rename_map[names(sample1)]
str(sample1)

names(sample2) <- rename_map[names(sample2)]
str(sample2)

Fundamentally the algorithm is simple:

Build a lookup table of current variable names to desired names
Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.

EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

answered Jan 02 '23 07:01

earino

    #no need to use rename 

    oldnames<-unique(c(names(sample1),names(sample2)))
    newnames<-c("var1","var2","var3")
    name_df<-data.frame(oldnames,newnames)
    mydata<-list(sample1,sample2) # combined two datasets as a list
#one liner
    finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in%  colnames(i),2]
return(i)})
> finaldata
[[1]]
   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

[[2]]
   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t

answered Jan 02 '23 07:01

Metrics

With dplyr, we can use a named vector with old names as values and new names as names, then unquote only the values in name_vec that matches names in your dataset. rename supports unquoting characters, so there is no need to convert them to sym beforehand:

library(dplyr)

name_vec <- c(var1 = "A", var2 = "B", var3 = "C")

sample1 %>%
  rename(!!name_vec[name_vec %in% names(.)])

sample2 %>%
  rename(!!name_vec[name_vec %in% names(.)])

Also, with setNames:

name_vec <- c(A = "var1", B = "var2", C = "var3")

sample1 %>%
  setNames(name_vec[names(.)])

sample2 %>%
  setNames(name_vec[names(.)])

Output:

   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t

answered Jan 02 '23 08:01

acylam

I’ve used @earino’s answer before myself, but discovered that it can be unsafe. If column names of the data frame are missing in the (names of the) named vector, those column names are silently replaced with NA and that is certainly not what you want.

d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

rename_vec <- c("B" = "var2", "C" = "var3")

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

The same can happen, if you run names(d1) <- rename_vec[names(d1)] twice by accident, because when you run it the second time, none of the colnames(d1) are in names(rename_vec).

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA: int  1 2 3 4 5 6 7 8 9 10
#>  $ NA: chr  "a" "b" "c" "d" ...

We just need to select those columns that are in the data frame and in the rename vector.

d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

sel <- is.element(colnames(d2), names(rename_vec))
names(d2)[sel] <- rename_vec[names(d2)][sel]
str(d2)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ B1  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

UPDATE: I initially had a solution here that involved string replacement, which turned out to be unsafe as well, because it allowed for partial matching. This one is better, I think.

answered Jan 02 '23 08:01

dpprdan

Related questions
                            
                                sqldf: query data by range of dates
                            
                                R devtools:document Dependency package not available
                            
                                MCMCglmm multinomial model in R
                            
                                Concatenating two string variables in r
                            
                                Install xlsx and rJava on macOS Mavericks 10.9.5
                            
                                Why does rownames(installed.packages()) have a names attribute?
                            
                                tabsetPanel within a fluidPage not working
                            
                                Set the height of the graphs y-axis in grid.arrange, but not of the entire plot area
                            
                                Changing the radius of a coord_polar ggplot
                            
                                Split labels over 2 lines in ggplot with factors
                            
                                How to colour the labels of a dendrogram by an additional factor variable in R
                            
                                Importing R markdown to Confluence
                            
                                Plotting cumulative histogram with negative and positive side in ggplot?
                            
                                Select one row from each group in a large data.table based on a condition [duplicate]
                            
                                Variables of a data.frame beginning by a dot disappear in within()
                            
                                R - Using switch with logical values
                            
                                lattice, connect points only if the connection has a positive slope
                            
                                R data.table Size and Memory Limits
                            
                                How does R handle Unicode / UTF-8?
                            
                                How to read large (~20 GB) xml file in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With