Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace substring every >n characters (conditionally insert linebreaks for spaces)

Tags:

regex

r

gsub

I would like to replace spaces with linebreaks (\n) in a pretty long chracter vector in R. However, I don't want to replace every space, but only if the substring exeeds a certain number of characters (n).

Example:

mystring <- "this string is annoyingly long and therefore I would like to insert linebreaks" 

Now I want to insert linebreaks in mystring at every space on the condition that each substring has a length greater than 20 characters (nchar > 20).

Hence, the resulting string is supposed to look like this:

"this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks") 

Linebreaks (\n) were inserted after 25, 26 and 25 characters.

How can I achieve this? Maybe something combining gsub and strsplit?

like image 749
gosz Avatar asked Dec 12 '16 13:12

gosz


1 Answers

You may use .{21,}?\s regex to match any 21 (since nchar > 20) chars or more, but as few as possible, up to the nearest whitespace:

> gsub("(.{21,}?)\\s", "\\1\n", mystring)
[1] "this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks"

Details:

  • (.{21,}?) - Group 1 capturing any 21 chars or more, but as few as possible (as {21,}? is a lazy quantifier)
  • \\s - a whitespace

The replacement contains the backreference to Group 1 to reinsert the text before the whitespace, and the newline char (feel free to add CR, too, if needed).

like image 184
Wiktor Stribiżew Avatar answered Nov 14 '22 10:11

Wiktor Stribiżew