For example, I have a string
"AAAAAAACGAAAAAACGAAADGCGEDCG"
I want to count how many times "CG"
is repeated.
How do I do that?
You can use gregexpr
to find the positions of "CG"
in vec
. We have to check whether there was no match (-1
). The function sum
counts the number of matches.
> vec <- "AAAAAAACGAAAAAACGAAADGCGEDCG"
> sum(gregexpr("CG", vec)[[1]] != -1)
[1] 4
If you have a vector of strings, you can use sapply
:
> vec <- c("ACACACACA", "GGAGGAGGAG", "AACAACAACAAC", "GGCCCGCCGC", "TTTTGTT", "AGAGAGA")
> sapply(gregexpr("CG", vec), function(x) sum(x != -1))
[1] 0 0 0 2 0 0
If you have a list of strings, you can use unlist(vec)
and then use the solution above.
The Bioconductor package Biostrings has a matchPattern function
countGC <- matchPattern("GC",DNSstring_object)
Note that DNSstring_object
is FASTA sequence read in using the biostring function readDNAStringSet
or readAAStringSet
Use str_count
from stringr
. It's simple to remember and read, though not a base function.
library(stringr)
str_count("AAAAAAACGAAAAAACGAAADGCGEDCG", "CG")
# [1] 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With