Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find indices of sequential duplicates in string in R [duplicate]

Tags:

string

r

I have a string that I've converted into a character vector:

string <- c("A","A","A","C","G","G","C","C","T","T","T","T")

I'd like to be able to output a table that shows the indices of the consecutive letters in the order they appear. for example:

letter start end
A 1 3
C 4 4
G 5 6
C 7 8
T 9 12

I've tried looking into str_locate and some other str functions but haven't been able to figure it out. Any help appreciated!

like image 418
Beeba Avatar asked Jan 02 '23 00:01

Beeba


1 Answers

I will using cumsum after rle

s=rle(string)
v=cumsum(rle(string)$lengths)
data.frame('var'=s$values,'start'=v+1-s$lengths,'end'=v)
  var start end
1   A     1   3
2   C     4   4
3   G     5   6
4   C     7   8
5   T     9  12
like image 97
BENY Avatar answered Jan 08 '23 02:01

BENY