Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transforming character strings ito sums when every character represents one number

Tags:

string

r

I have a vector containing DNA sequences strings:

x <- c("ATTAGCCGAGC", "TTCCGGTTAA")

I would like to transform these strings into a sum according to the rule

A <- 2
T <- 2
G <- 4
C <- 4

so that ATTAGCCGAGC is translated to "2+2+2+2+4+4+4+4+2+4+4" and the final output would be "34".

Desired output: A dataframe consisting of a a column of the original vector X and another column of the "sum-transformations".

Thanks.

I hope that its not a problem to use "T".

like image 882
nouse Avatar asked Oct 27 '25 05:10

nouse


1 Answers

You can create a named vector with the values, split the strings, match and sum, i.e.

vals <- setNames(c(2, 2, 4, 4), c('A', 'T', 'G', 'C'))
sapply(strsplit(x, ''), \(i)sum(vals[i]))
#[1] 34 28

Put the in a dataframe like that,

data.frame(string = x, 
           val = sapply(strsplit(x, ''), \(i)sum(vals[i])))

       string val
1 ATTAGCCGAGC  34
2  TTCCGGTTAA  28
like image 55
Sotos Avatar answered Oct 28 '25 19:10

Sotos