I have an attribute consisting DNA sequences and would like to translate it to its amino name. So I need to split the sequence in a fixed-length character that is 3. Here is the sample of the data
data=c("AATAGACGT","TGACCC","AAATCACTCTTT")
How can I extract it into:
[1] "AAT" "AGA" "CGT"
[2] "TGA" "CCC"
[3] "AAA" "TCA" "CTC" "TTT"
So far I can only find how to split a string given a certain regex as the separator
Try
strsplit(data, '(?<=.{3})', perl=TRUE)
Or
library(stringi)
stri_extract_all_regex(data, '.{1,3}')
Another solution, still one liner, but less elegant than the other ones (using lapply
):
lapply(data, function(u) substring(u, seq(1, nchar(u), 3), seq(3, nchar(u),3)))
#[[1]]
#[1] "AAT" "AGA" "CGT"
#[[2]]
#[1] "TGA" "CCC"
#[[3]]
#[1] "AAA" "TCA" "CTC" "TTT"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With