Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the indexes of multiple/overlapping matching substrings

Tags:

regex

r

I have a string, s="CCCGTGCC" and a subtstring ss="CC". I want to get all the indexes in s that start the string ss. In my example I would want to get back the array c(1,2,6).

Is there any string function that achieves this? Notice that my string is in the form "CCCGTGCC", and not c("C","C","C","G","T","G","C","C").

grep only returns whether there is a match anywhere in the string, and not the indexes of the matches within the string, unless I'm missing something.

like image 406
dan12345 Avatar asked Oct 24 '11 16:10

dan12345


1 Answers

Try gregexpr with perl=TRUE and use perl regular expressions with look-ahead assertions (see ?regex):

gregexpr("(?=CC)","CCCGTGCC",perl=TRUE)
[[1]]
[1] 1 2 7
attr(,"match.length")
[1] 0 0 0
like image 197
Joshua Ulrich Avatar answered Nov 02 '22 19:11

Joshua Ulrich