Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse vector to string of characters with respective numbers of consequtive occurences

Tags:

r

I would like to collapse a CIGAR vector to a CIGAR string. By CIGAR vector to String I mean the following:

I want a function that converts:

cigar.vector = c("M", "M", "I", "I", "M", "I", "", "M", "D", "D", "M", "I", "D", "M", "I")

to this:

cigar.string = "2M2I1M1I1M2D1M1I1D1M1I"

and viceversa.

Note that there is a "" (empty character), that does not count. thanks!

like image 782
Dnaiel Avatar asked Sep 23 '13 22:09

Dnaiel


1 Answers

rle seems the obvious choice here:

rcv <- rle(cigar.vector[cigar.vector!=""])
paste0(rcv$lengths,rcv$values,collapse="")
#[1] "2M2I1M1I1M2D1M1I1D1M1I"

If you want to get fancy, you could also exploit the fact that rle gives a list of length 2:

paste(do.call(rbind,rle(cigar.vector[cigar.vector!=""])),collapse="")
#[1] "2M2I1M1I1M2D1M1I1D1M1I"

Going backwards will be impossible if only given the result (assign above to result), as it has lost information for the "" cases. Excluding those cases, you can get close enough with something like:

backwards <- rep(
  unlist(strsplit(result,"\\d+"))[-1],
  as.numeric(unlist(strsplit(result,"[^0-9]")))
)
identical(cigar.vector[cigar.vector!=""],backwards)
#[1] TRUE
like image 138
thelatemail Avatar answered Sep 22 '22 00:09

thelatemail