[] denotes a character class. () denotes a capturing group. (a-z0-9) -- Explicit capture of a-z0-9 . No ranges.
\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.
Flag u enables the support of Unicode in regular expressions. That means two things: Characters of 4 bytes are handled correctly: as a single character, not two 2-byte characters. Unicode properties can be used in the search: \p{…} .
"\n" matches a newline character.
\p{L}
matches a single code point in the category "letter".\p{N}
matches any kind of numeric character in any script.
Source: regular-expressions.info
If you're going to work with regular expressions a lot, I'd suggest bookmarking that site, it's very useful.
These are Unicode property shortcuts (\p{L}
for Unicode letters, \p{N}
for Unicode digits). They are supported by .NET, Perl, Java, PCRE, XML, XPath, JGSoft, Ruby (1.9 and higher) and PHP (since 5.1.0)
At any rate, that's a very strange regex. You should not be using alternation when a character class would suffice:
[\p{L}\p{N}_.-]*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With