Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: How to change the column names in a data frame based on a specification

I have a data frame, the start of it is below:

                                SM_H1455          SM_V1456          SM_K1457      SM_X1461          SM_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

Note that there are many more cols and rows.

Here's what I want to do: I want to change the name of the columns. The most important information in a column's name, e.g. SM_H1455, is the 4th character of the character string. In this case it's a H. What I want to do is to change the "SM" part to "Control" if the 4th character is "H" or "K", and "Case" if the 4th column is "X" or "V". I'd like to keep everything else in the name. So that in the end, I'd like a table like this:

                        Control_H1455          Case_V1456        Control_K1457      Case_X1461        Control_K1462
ENSG00000000419.8                290               270               314               364               240
ENSG00000000457.8                252               230               242               220               106
ENSG00000000460.11               154               158               162               136                64
ENSG00000000938.7              20106             18664             19764             15640             19024
ENSG00000000971.11                30                10                 4                 2                10

Please keep in mind that whether the 4th character is "V", "X", "K" or "H" is completely random.

I'd appreciate any help! Thanks.

like image 958
zfz Avatar asked Feb 16 '23 17:02

zfz


1 Answers

One way, where x is your df:

controls <- which(substring(names(x),4,4) %in% c("H","K"))
cases <- which(substring(names(x),4,4) %in% c("X","V"))
names(x)[controls] <- gsub("SM","Control",names(x)[controls])
names(x)[cases] <- gsub("SM","Case",names(x)[cases])

Alternatively:

names(x) <- sapply(names(x),function(z) {
    if(substring(z,4,4) %in% c("H","K"))
        sub("SM","Control",z)
    else if(substring(z,4,4) %in% c("X","V"))
        sub("SM","Case",z)
})
like image 166
Thomas Avatar answered Feb 18 '23 10:02

Thomas