Match whitespace but not newlines

1 Answers

Use a double-negative:

/[^\S\r\n]/

That is, not-not-whitespace (the capital S complements) or not-carriage-return or not-newline. Distributing the outer not (i.e., the complementing ^ in the character class) with De Morgan's law, this is equivalent to “whitespace but not carriage return or newline.” Including both \r and \n in the pattern correctly handles all of Unix (LF), classic Mac OS (CR), and DOS-ish (CR LF) newline conventions.

No need to take my word for it:

#! /usr/bin/env perl  use strict; use warnings;  use 5.005;  # for qr//  my $ws_not_crlf = qr/[^\S\r\n]/;  for (' ', '\f', '\t', '\r', '\n') {   my $qq = qq["$_"];   printf "%-4s => %s\n", $qq,     (eval $qq) =~ $ws_not_crlf ? "match" : "no match"; }

Output:

" "  => match "\f" => match "\t" => match "\r" => no match "\n" => no match

Note the exclusion of vertical tab, but this is addressed in v5.18.

Before objecting too harshly, the Perl documentation uses the same technique. A footnote in the “Whitespace” section of perlrecharclass reads

Prior to Perl v5.18, \s did not match the vertical tab. [^\S\cK] (obscurely) matches what \s traditionally did.

The same section of perlrecharclass also suggests other approaches that won’t offend language teachers’ opposition to double-negatives.

Outside locale and Unicode rules or when the /a switch is in effect, “\s matches [\t\n\f\r ] and, starting in Perl v5.18, the vertical tab, \cK.” Discard \r and \n to leave /[\t\f\cK ]/ for matching whitespace but not newline.

If your text is Unicode, use code similar to the sub below to construct a pattern from the table in the aforementioned documentation section.

sub ws_not_nl {   local($_) = <<'EOTable'; 0x0009        CHARACTER TABULATION   h s 0x000a              LINE FEED (LF)    vs 0x000b             LINE TABULATION    vs  [1] 0x000c              FORM FEED (FF)    vs 0x000d        CARRIAGE RETURN (CR)    vs 0x0020                       SPACE   h s 0x0085             NEXT LINE (NEL)    vs  [2] 0x00a0              NO-BREAK SPACE   h s  [2] 0x1680            OGHAM SPACE MARK   h s 0x2000                     EN QUAD   h s 0x2001                     EM QUAD   h s 0x2002                    EN SPACE   h s 0x2003                    EM SPACE   h s 0x2004          THREE-PER-EM SPACE   h s 0x2005           FOUR-PER-EM SPACE   h s 0x2006            SIX-PER-EM SPACE   h s 0x2007                FIGURE SPACE   h s 0x2008           PUNCTUATION SPACE   h s 0x2009                  THIN SPACE   h s 0x200a                  HAIR SPACE   h s 0x2028              LINE SEPARATOR    vs 0x2029         PARAGRAPH SEPARATOR    vs 0x202f       NARROW NO-BREAK SPACE   h s 0x205f   MEDIUM MATHEMATICAL SPACE   h s 0x3000           IDEOGRAPHIC SPACE   h s EOTable    my $class;   while (/^0x([0-9a-f]{4})\s+([A-Z\s]+)/mg) {     my($hex,$name) = ($1,$2);     next if $name =~ /\b(?:CR|NL|NEL|SEPARATOR)\b/;     $class .= "\\N{U+$hex}";   }    qr/[$class]/u; }

Other Applications

The double-negative trick is also handy for matching alphabetic characters too. Remember that \w matches “word characters,” alphabetic characters and digits and underscore. We ugly-Americans sometimes want to write it as, say,

if (/[A-Za-z]+/) { ... }

but a double-negative character-class can respect the locale:

if (/[^\W\d_]+/) { ... }

Expressing “a word character but not digit or underscore” this way is a bit opaque. A POSIX character-class communicates the intent more directly

if (/[[:alpha:]]+/) { ... }

or with a Unicode property as szbalint suggested

if (/\p{Letter}+/) { ... }

145

answered Sep 21 '22 03:09

Greg Bacon

Related questions
                            
                                Searching for UUIDs in text with regex
                            
                                Simple regular expression for a decimal with a precision of 2
                            
                                Extract hostname name from string
                            
                                RegEx: Grabbing values between quotation marks
                            
                                How do I get the YouTube video ID from a URL?
                            
                                Replace specific characters within strings
                            
                                Split a string by spaces -- preserving quoted substrings -- in Python
                            
                                Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces
                            
                                How to use a variable inside a regular expression?
                            
                                I want to remove double quotes from a String
                            
                                How to use JavaScript regex over multiple lines?
                            
                                Regex lookahead, lookbehind and atomic groups
                            
                                Why does a RegExp with global flag give wrong results?
                            
                                VSCode regex find & replace submatch math?
                            
                                Remove new lines from string and replace with one empty space
                            
                                regex.test V.S. string.match to know if a string matches a regular expression
                            
                                Remove all special characters, punctuation and spaces from string
                            
                                Find CRLF in Notepad++
                            
                                Test if characters are in a string
                            
                                How can I output only captured groups with sed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match whitespace but not newlines

Tags:

regex

perl

JoelFan

People also ask

1 Answers

Other Applications

Greg Bacon

Recent Activity

Donate For Us