Let's say I have a specific string in R, say "ABCDEFG". I can break it into a sequence of say every two characters using the following regex.
strsplit("ABCDEFG", "(?<=(..))", perl = TRUE)
[[1]]
[1] "AB" "CD" "EF" "G"
But I want to split it into a specific sequence. First two characters then next one character, then again two then one and so on.
If my input string is "ABCDEFG" I want "AB" "C" "DE" "F" "G" as output (in last element there is only one element left).
How can I do it. I do not want to count nchar beforehand as I want to do it dynamically.
We could generalize Edward's and rawr's ideas.
> spl_pat <- \(x, p) {
+ stopifnot(all(is.na(p) | p >= 0))
+ if (any(is.na(p))) return(x) ## compatibility w/ strsplit()
+ if (identical(p, NULL)) p <- 1 ## compatibility w/ strsplit()
+ .spl <- \(x) {
+ pat <- rep_len(p, len=1 + nchar(x)/2)
+ start <- cumsum(c(1, pat[-length(pat)]))
+ stop <- cumsum(pat)
+ Filter(nzchar, substring(x, start, stop))
+ }
+ if (length(x) > 1L) lapply(x, .spl) else .spl(x)
+ }
Single strings, length(x) == 1L:
> spl_pat('ABCDEFG', 2:1)
[1] "AB" "C" "DE" "F"
> spl_pat('ABCDEFG', c(1, 4))
[1] "A" "BCDE" "F" "G"
> spl_pat('ABCDEFG', c(0, 4))
[1] "ABCD" "EFG"
> spl_pat('ABCDEFG', 1:1e3)
[1] "A" "BC" "DEF" "G"
> spl_pat('ABCDEFG', 2)
[1] "AB" "CD" "EF" "G"
> spl_pat('ABCDEFG', 1)
[1] "A" "B" "C" "D"
> spl_pat('ABCDEFG', 0)
character(0)
> spl_pat('ABCDEFG', NA)
[1] "ABCDEFG"
> spl_pat('ABCDEFG', NULL)
[1] "A" "B" "C" "D"
Multiple strings, length(x) > 1L:
> spl_pat(c('ABCDEFG', 'ABCDEFGHIJ'), 2:1)
[[1]]
[1] "AB" "C" "DE" "F"
[[2]]
[1] "AB" "C" "DE" "F" "GH" "I"
Different patterns:
> Vectorize(spl_pat)(c('ABCDEFG', 'ABCDEFGHIJ'), list(2:1, 1:2))
$ABCDEFG
[1] "AB" "C" "DE" "F"
$ABCDEFGHIJ
[1] "A" "BC" "D" "EF" "G" "HI"
> Vectorize(spl_pat)(c('ABCDEFG', 'ABCDEFGHIJ', 'ABCDEFGHIJ'), list(2:1, 1:2, 0))
$ABCDEFG
[1] "AB" "C" "DE" "F"
$ABCDEFGHIJ
[1] "A" "BC" "D" "EF" "G" "HI"
$ABCDEFGHIJ
[1] ""
p < 0 probably wouldn't make sense, would it?:
> spl_pat('ABCDEFG', -1)
Error in spl_pat("ABCDEFG", -1) : all(is.na(p) | p >= 0) is not TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With