I'm trying to create a loose word wrapping system via a regex in Perl. What I would like is about every 70 characters or so to check for the next whitespace occurrence and replace that space with a newline, and then do this for the whole string. The string I'm operating on may already have newlines in it already, but the amount of text between newlines tends to be very lengthy.
I'd like to avoid looping one character at a time or using substr if I can, and I would prefer to edit this string in place as opposed to creating new string objects. These are just preferences, though, and if I can't achieve what I'm looking for without breaking these preferences then that's fine.
Thoughts?
This is the one I've always used.
Unlike the accepted solution, it will wrap BEFORE the wrap-length (in this case, 70 characters), unless there's a really long "word" without spaces (such as a URL), in which case it will just place that word on its own line, rather than break it.
s/(?=.{70,})(.{0,70}\n?)( )/\1\2\n /g
This second form handles all line endings: Mac \r, Unix \n, Windows \r\n, and Teletype \n\r, but which one it uses as a replacement still depends on what you put in the replacement clause: I've used \n.
s/(?=.{70,})(.{0,70}(?:\r\n?|\n\r?)?)( )/\1\2\n /g
Both versions also indent all wrapped lines after the first by one space: remove the space before the last /g if you don't want that, but I usually find it nicer.
Welbog's answer wraps at the first space after 70 characters. This has the flaw that long words beginning close to the end of the line make an overlong line. I would suggest instead wrapping at the last space within the first, say, 81 characters, or wrapping at the first space if you have a >80 character "word", so that only truly unbreakable lines are overlong:
s/(.{1,79}\S|\S+)\s+/$1\n/g;
In modern perl:
s/(?:.{1,79}\S|\S+)\K\s+/\n/g;
Look at modules like Text::Wrap or Text::Autoformat.
Depending on your needs, even the GNU core utility fold(1) may be an option.
s/(.{70}[^\s]*)\s+/$1\n/
Consume the first 70 characters, then stop at the next whitespace, capturing everything in the process. Then, emit the captured string, omitting the whitespace at the end, adding a newline.
This doesn't guarantee your lines will cut off strictly at 80 characters or something. There's no guarantee the last word it consumes won't be a billion characters long.
You can get much, much more control and reliability by using Text::Format
use Text::Format;
print Text::Format->new({columns => 70})->format($text);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With