Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression \p{L} and \p{N}

People also ask

What's the difference between () and [] in regular expression?

[] denotes a character class. () denotes a capturing group. (a-z0-9) -- Explicit capture of a-z0-9 . No ranges.

What is the regex for Unicode paragraph seperator?

\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.

What is U flag in regex?

Flag u enables the support of Unicode in regular expressions. That means two things: Characters of 4 bytes are handled correctly: as a single character, not two 2-byte characters. Unicode properties can be used in the search: \p{…} .

What does N mean in regex?

"\n" matches a newline character.


\p{L} matches a single code point in the category "letter".
\p{N} matches any kind of numeric character in any script.

Source: regular-expressions.info

If you're going to work with regular expressions a lot, I'd suggest bookmarking that site, it's very useful.


These are Unicode property shortcuts (\p{L} for Unicode letters, \p{N} for Unicode digits). They are supported by .NET, Perl, Java, PCRE, XML, XPath, JGSoft, Ruby (1.9 and higher) and PHP (since 5.1.0)

At any rate, that's a very strange regex. You should not be using alternation when a character class would suffice:

[\p{L}\p{N}_.-]*