Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert values in column matching pattern in R

Tags:

r

I have this dataframe mydf. The column nucleotide could have A, T,G,C letters. I want to change the letter A to T , C to G, G to C, and T to A, if the strand column is -. How do I do it?

  mydf<- structure(list(seqnames = structure(c(1L, 1L, 1L, 1L), .Label = c("chr1", 
    "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", 
    "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
    "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX", 
    "chrY", "chrM"), class = "factor"), pos = c(115258748, 115258748, 
    115258748, 115258748), strand = structure(c(1L, 2L, 1L, 2L), .Label = c("+", 
    "-", "*"), class = "factor"), nucleotide = structure(c(2L, 2L, 
    2L, 2L), .Label = c("A", "C", "G", "T", "N", "=", "-"), class = "factor")), .Names = c("seqnames", 
    "pos", "strand", "nucleotide"), row.names = c(NA, 4L), class = "data.frame")

result

 seqnames       pos strand nucleotide
1     chr1 115258748      +          C
2     chr1 115258748      -          G
3     chr1 115258748      +          C
4     chr1 115258748      -          G
like image 771
MAPK Avatar asked Oct 14 '15 05:10

MAPK


1 Answers

For one-to-one character translation, you can use chartr().

within(mydf, {
  nucleotide[strand == "-"] <- chartr("ACGT", "TGCA", nucleotide[strand == "-"])
})
#   seqnames       pos strand nucleotide
# 1     chr1 115258748      +          C
# 2     chr1 115258748      -          G
# 3     chr1 115258748      +          C
# 4     chr1 115258748      -          G

Note that I used within() here to avoid writing mydf$ four times and to save from changing the original data. You can also write the following, but keep in mind you will change the original data.

mydf$nucleotide[mydf$strand == "-"] <- 
    with(mydf, chartr("ACGT", "TGCA", nucleotide[strand == "-"]))
like image 89
Rich Scriven Avatar answered Nov 09 '22 11:11

Rich Scriven