Given a string such as:
text <- "abcdefghijklmnopqrstuvwxyz"
I would like to chop the string into substrings, for example length 10, and keep the remainder:
"abcdefghij"
"klmnopqrst"
"uvwxyz"
All the methods I know for creating substrings will not give me the remainder substring with 6 characters. I have tried answers from previous similar questions such as:
> substring(text, seq(1, nchar(text), 10), seq(10, nchar(text), 10))
[1] "abcdefghij" "klmnopqrst" ""
Any advice as to how to obtain all the substrings of the desired length and any remainder strings would be much appreciated.
Try
strsplit(text, '(?<=.{10})', perl=TRUE)[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Or you can use the library(stringi)
for faster approach
library(stringi)
stri_extract_all_regex(text, '.{1,10}')[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"
The vectors that you use for the first
and last
arguments in substring
can exceed the number of characters in the string without error/warning/problems. So you can do
text <- "abcdefghijklmnopqrstuvwxyz"
sq <- seq.int(to = nchar(text), by = 10)
substring(text, sq, sq + 9)
# [1] "abcdefghij" "klmnopqrst" "uvwxyz"
Here is a way using strapplyc
involving a fairly simple regular expression. It works because .{1,10}
always matches the longest string that is no longer than 10 characters:
library(gsubfn)
strapplyc(text, ".{1,10}", simplify = c)
giving:
[1] "abcdefghij" "klmnopqrst" "uvwxyz"
Visualization This regular expression is simple enough that it does not really require a visualization but here is one anyways:
.{1,10}
Debuggex Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With