Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Renaming a few columns in a data frame: Why doesn't ifelse with %in% operator preserve order?

Tags:

r

This is in reference to this question.

I'd like to rename a subset of columns in a large data frame. I'd expect the following code to rename columns X4,X5,X6 and X7 to gradek, grade1, grade2, and grade3 respectively:

set.seed(1)
in.df <- data.frame( matrix( rnorm(60), ncol=10) )
names(in.df) <- ifelse( names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
                         paste('grade', c('k',1:3), sep=''),
                         names(in.df) )

However,

> names(in.df)
 [1] "X1"     "X2"     "X3"     "grade3" "gradek" "grade1" "grade2" "X8"    
 [9] "X9"     "X10"   

even though

> paste('grade', c('k',1:3), sep='')
[1] "gradek" "grade1" "grade2" "grade3"

showing that the order isn't preserved. This thread, suggests that using match instead of %in% would work, but in this case it does not. ( Perhaps that was true in other versions of R. In my installed version (2.15.3), the help page on match suggests that %in% is defined via match so switching it up would be of no help. )

Any help would be appreciated!

Accepted answers This answer fixes my renaming problem. This answer explains the weird behavior is due to recycling.

like image 909
Nathan VanHoudnos Avatar asked Jan 14 '23 14:01

Nathan VanHoudnos


2 Answers

%in% should work, but perhaps match is better.

Consider the following. "A" and "B" represent your names(in.df). We want to replace the values in "matchme" in that order using the results of paste('grade', c('k',1:3), sep='').

Compare the different output:

A <- B <- c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10")
matchme <- c('X4', 'X7', 'X6', 'X5')
A[A %in% matchme] <- paste('grade', c('k',1:3), sep='')
A
#  [1] "X1"     "X2"     "X3"     "gradek" "grade1" "grade2" "grade3" "X8"    
#  [9] "X9"     "X10"  
B[match(matchme, B)] <- paste('grade', c('k',1:3), sep='')
B
#  [1] "X1"     "X2"     "X3"     "gradek" "grade3" "grade2" "grade1" "X8"    
#  [9] "X9"     "X10"   
like image 83
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 19 '23 11:01

A5C1D2H2I1M1N2O1R2T1


Ananda's answer gives a good approach of how to do what you want. I will instead answer the question as to why you got the results you did rather than the ones you expected.

The reason the names seem out of order is related to how ifelse works and argument recycling. Let's look at the three arguments to ifelse:

> list(names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
+      paste('grade', c('k',1:3), sep=''),
+      names(in.df))
[[1]]
 [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

[[2]]
[1] "gradek" "grade1" "grade2" "grade3"

[[3]]
 [1] "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10"

ifelse decides which corresponding element to pick based on whether the first argument is TRUE or FALSE. But the second argument is not as long as the first, so it is recycled to be the right length. Putting these into a data.frame so that looking at them side-by-side is easier, and manually expanding out the second set of names, gives:

> data.frame(test = names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
+            `TRUE` = rep(paste('grade', c('k',1:3), sep=''),length=10),
+            `FALSE` = names(in.df))
    test  TRUE. FALSE.
1  FALSE gradek     X1
2  FALSE grade1     X2
3  FALSE grade2     X3
4   TRUE grade3     X4
5   TRUE gradek     X5
6   TRUE grade1     X6
7   TRUE grade2     X7
8  FALSE grade3     X8
9  FALSE gradek     X9
10 FALSE grade1    X10

So the 4th, 5th, 6th, and 7th elements of the new names are used, which correspond, due to argument recycling, to the 4th, 1st, 2nd, and 3rd.

like image 34
Brian Diggs Avatar answered Jan 19 '23 11:01

Brian Diggs