I have a vector containing DNA sequences strings:
x <- c("ATTAGCCGAGC", "TTCCGGTTAA")
I would like to transform these strings into a sum according to the rule
A <- 2
T <- 2
G <- 4
C <- 4
so that ATTAGCCGAGC is translated to "2+2+2+2+4+4+4+4+2+4+4" and the final output would be "34".
Desired output: A dataframe consisting of a a column of the original vector X and another column of the "sum-transformations".
Thanks.
I hope that its not a problem to use "T".
You can create a named vector with the values, split the strings, match and sum, i.e.
vals <- setNames(c(2, 2, 4, 4), c('A', 'T', 'G', 'C'))
sapply(strsplit(x, ''), \(i)sum(vals[i]))
#[1] 34 28
Put the in a dataframe like that,
data.frame(string = x,
val = sapply(strsplit(x, ''), \(i)sum(vals[i])))
string val
1 ATTAGCCGAGC 34
2 TTCCGGTTAA 28
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With