Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I word wrap a string in Perl?

Tags:

string

regex

perl

I'm trying to create a loose word wrapping system via a regex in Perl. What I would like is about every 70 characters or so to check for the next whitespace occurrence and replace that space with a newline, and then do this for the whole string. The string I'm operating on may already have newlines in it already, but the amount of text between newlines tends to be very lengthy.

I'd like to avoid looping one character at a time or using substr if I can, and I would prefer to edit this string in place as opposed to creating new string objects. These are just preferences, though, and if I can't achieve what I'm looking for without breaking these preferences then that's fine.

Thoughts?

like image 403
Kyle Walsh Avatar asked Jun 05 '09 15:06

Kyle Walsh


5 Answers

This is the one I've always used.

Unlike the accepted solution, it will wrap BEFORE the wrap-length (in this case, 70 characters), unless there's a really long "word" without spaces (such as a URL), in which case it will just place that word on its own line, rather than break it.

s/(?=.{70,})(.{0,70}\n?)( )/\1\2\n /g

This second form handles all line endings: Mac \r, Unix \n, Windows \r\n, and Teletype \n\r, but which one it uses as a replacement still depends on what you put in the replacement clause: I've used \n.

s/(?=.{70,})(.{0,70}(?:\r\n?|\n\r?)?)( )/\1\2\n /g

Both versions also indent all wrapped lines after the first by one space: remove the space before the last /g if you don't want that, but I usually find it nicer.

like image 157
Dewi Morgan Avatar answered Oct 18 '22 00:10

Dewi Morgan


Welbog's answer wraps at the first space after 70 characters. This has the flaw that long words beginning close to the end of the line make an overlong line. I would suggest instead wrapping at the last space within the first, say, 81 characters, or wrapping at the first space if you have a >80 character "word", so that only truly unbreakable lines are overlong:

s/(.{1,79}\S|\S+)\s+/$1\n/g;

In modern perl:

s/(?:.{1,79}\S|\S+)\K\s+/\n/g;
like image 32
ysth Avatar answered Nov 13 '22 11:11

ysth


Look at modules like Text::Wrap or Text::Autoformat.

Depending on your needs, even the GNU core utility fold(1) may be an option.

like image 23
Fritz G. Mehner Avatar answered Nov 13 '22 13:11

Fritz G. Mehner


s/(.{70}[^\s]*)\s+/$1\n/

Consume the first 70 characters, then stop at the next whitespace, capturing everything in the process. Then, emit the captured string, omitting the whitespace at the end, adding a newline.

This doesn't guarantee your lines will cut off strictly at 80 characters or something. There's no guarantee the last word it consumes won't be a billion characters long.

like image 12
Welbog Avatar answered Nov 13 '22 12:11

Welbog


You can get much, much more control and reliability by using Text::Format

use Text::Format;
print Text::Format->new({columns => 70})->format($text);
like image 5
cubabit Avatar answered Nov 13 '22 13:11

cubabit