Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string into substrings of given length with remainder

Given a string such as:

text <- "abcdefghijklmnopqrstuvwxyz"

I would like to chop the string into substrings, for example length 10, and keep the remainder:

"abcdefghij"
"klmnopqrst"
"uvwxyz"

All the methods I know for creating substrings will not give me the remainder substring with 6 characters. I have tried answers from previous similar questions such as:

> substring(text, seq(1, nchar(text), 10), seq(10, nchar(text), 10))
[1] "abcdefghij" "klmnopqrst" ""  

Any advice as to how to obtain all the substrings of the desired length and any remainder strings would be much appreciated.

like image 865
grdn Avatar asked Dec 15 '14 18:12

grdn


3 Answers

Try

strsplit(text, '(?<=.{10})', perl=TRUE)[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz" 

Or you can use the library(stringi) for faster approach

library(stringi)
stri_extract_all_regex(text, '.{1,10}')[[1]]
#[1] "abcdefghij" "klmnopqrst" "uvwxyz"    
like image 121
akrun Avatar answered Oct 16 '22 23:10

akrun


The vectors that you use for the first and last arguments in substring can exceed the number of characters in the string without error/warning/problems. So you can do

text <- "abcdefghijklmnopqrstuvwxyz"

sq <- seq.int(to = nchar(text), by = 10)
substring(text, sq, sq + 9)
# [1] "abcdefghij" "klmnopqrst" "uvwxyz"   
like image 25
Rich Scriven Avatar answered Oct 17 '22 00:10

Rich Scriven


Here is a way using strapplyc involving a fairly simple regular expression. It works because .{1,10} always matches the longest string that is no longer than 10 characters:

library(gsubfn)
strapplyc(text, ".{1,10}", simplify = c)

giving:

[1] "abcdefghij" "klmnopqrst" "uvwxyz"

Visualization This regular expression is simple enough that it does not really require a visualization but here is one anyways:

.{1,10}

Regular expression visualization

Debuggex Demo

like image 3
G. Grothendieck Avatar answered Oct 17 '22 00:10

G. Grothendieck