This is in reference to this question.
I'd like to rename a subset of columns in a large data frame. I'd expect the following code to rename columns X4
,X5
,X6
and X7
to gradek
, grade1
, grade2
, and grade3
respectively:
set.seed(1)
in.df <- data.frame( matrix( rnorm(60), ncol=10) )
names(in.df) <- ifelse( names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
paste('grade', c('k',1:3), sep=''),
names(in.df) )
However,
> names(in.df)
[1] "X1" "X2" "X3" "grade3" "gradek" "grade1" "grade2" "X8"
[9] "X9" "X10"
even though
> paste('grade', c('k',1:3), sep='')
[1] "gradek" "grade1" "grade2" "grade3"
showing that the order isn't preserved. This thread, suggests that using match
instead of %in%
would work, but in this case it does not. ( Perhaps that was true in other versions of R. In my installed version (2.15.3), the help page on match
suggests that %in%
is defined via match
so switching it up would be of no help. )
Any help would be appreciated!
Accepted answers This answer fixes my renaming problem. This answer explains the weird behavior is due to recycling.
%in%
should work, but perhaps match
is better.
Consider the following. "A" and "B" represent your names(in.df)
. We want to replace the values in "matchme" in that order using the results of paste('grade', c('k',1:3), sep='')
.
Compare the different output:
A <- B <- c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10")
matchme <- c('X4', 'X7', 'X6', 'X5')
A[A %in% matchme] <- paste('grade', c('k',1:3), sep='')
A
# [1] "X1" "X2" "X3" "gradek" "grade1" "grade2" "grade3" "X8"
# [9] "X9" "X10"
B[match(matchme, B)] <- paste('grade', c('k',1:3), sep='')
B
# [1] "X1" "X2" "X3" "gradek" "grade3" "grade2" "grade1" "X8"
# [9] "X9" "X10"
Ananda's answer gives a good approach of how to do what you want. I will instead answer the question as to why you got the results you did rather than the ones you expected.
The reason the names seem out of order is related to how ifelse
works and argument recycling. Let's look at the three arguments to ifelse
:
> list(names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
+ paste('grade', c('k',1:3), sep=''),
+ names(in.df))
[[1]]
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[[2]]
[1] "gradek" "grade1" "grade2" "grade3"
[[3]]
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
ifelse
decides which corresponding element to pick based on whether the first argument is TRUE or FALSE. But the second argument is not as long as the first, so it is recycled to be the right length. Putting these into a data.frame so that looking at them side-by-side is easier, and manually expanding out the second set of names, gives:
> data.frame(test = names(in.df) %in% c('X4', 'X5', 'X6', 'X7'),
+ `TRUE` = rep(paste('grade', c('k',1:3), sep=''),length=10),
+ `FALSE` = names(in.df))
test TRUE. FALSE.
1 FALSE gradek X1
2 FALSE grade1 X2
3 FALSE grade2 X3
4 TRUE grade3 X4
5 TRUE gradek X5
6 TRUE grade1 X6
7 TRUE grade2 X7
8 FALSE grade3 X8
9 FALSE gradek X9
10 FALSE grade1 X10
So the 4th, 5th, 6th, and 7th elements of the new names are used, which correspond, due to argument recycling, to the 4th, 1st, 2nd, and 3rd.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With