Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the constructs \H, \V and \N mean?

The following constructs are not well documented, but they do work as of specific versions of PHP onwards; Which are these versions, what are these constructs and which other implementations support this?

  • \H
  • \V
  • \N

This thread is part of The Stack Overflow Regex Reference.

like image 294
Unihedron Avatar asked Nov 17 '14 12:11

Unihedron


People also ask

What is the use of \\ s in Java?

\\s - matches single whitespace character. \\s+ - matches sequence of one or more whitespace characters.

How to define patterns in java?

A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression.

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"

What means in regular expression?

A Regular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. In other words, a regex accepts a certain set of strings and rejects the rest.


1 Answers

\H matches anything which aren't horizontal whitespace. This includes tab character and all "space separator" Unicode characters. This is the same as:

[^\h] or
[^\t\p{Zs}]

\V is the negated class of \v - It is named "non vertical whitespace character" and matches any characters which aren't a vertical whitespace character of those which are treated as line breaks in the Unicode standard and would be matched by \v, and is the same as the following as introduced in Perl 5:

[^\v] or
[^\n\cK\f\r\x85\x{2028}\x{2029}]

\N matches any characters which aren't the line feed character \n. Simple!

[^\n]

What's the difference between \V+ and \N+ ?Thanks to Avinash Raj for asking.

As Perl 5.10 specified in the documentation, \V is the same as [^\n\cK\f\r\x85\x{2028}\x{2029}] and shouldn't match any of \n, \r or \f, as well as Ctrl+(Control char) (*nix), 0x85, 0x2028 and 0x2029.

These character classes are handy and incredibly effective for when you want to match everything within the horizontal text - \V+ - or simply consuming an entire paragraph - \N+ - among various other use cases.


The following implementations supports \H, \V and \N:

  • Perl 5.10
  • PCRE 7.2
  • PHP programmers may find a discrepancy over which versions supports these constructs. As they came from Perl 5, one has to set the PCRE version instead; You can check this using phpinfo(). By default, PHP 5.2.2 does.
  • Java 8 java.util.regex.Pattern support for \H and \V constructs has been added as part of implementing \h, \v, which was not true for Java 7, however \N is not yet supported. Tested with JDK8u25.
like image 166
Unihedron Avatar answered Oct 05 '22 12:10

Unihedron