Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can someone explain this regular expression?

Tags:

regex

php

/^[\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]+$/mu

This is the regular expression validation that cakePHP uses to validate alphanumeric strings. I am unable to understand what Ll, Lm, Lt etc are? This is to validate alphanumeric strings, so they should test for numbers and characters. Could someone explain this expression a little.

Thank you.

like image 405
macha Avatar asked Jan 10 '11 16:01

macha


2 Answers

Ll, Lm, Lo, Lt, Lu, Nd are unicode character classes.

See here at around 1/3 of the page:

http://www.regular-expressions.info/unicode.html

  • \p{Ll} or \p{Lowercase_Letter}: a lowercase letter that has an uppercase variant.
  • \p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant.
  • \p{Lt} or \p{Titlecase_Letter}: a letter that appears at the start of a word when only the first letter of the word is capitalized.
  • \p{L&} or \p{Letter&}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
  • \p{Lm} or \p{Modifier_Letter}: a special character that is used like a letter.
  • \p{Lo} or \p{Other_Letter}: a letter or ideograph that does not have lowercase and uppercase variants.
like image 132
Mihai Toader Avatar answered Nov 14 '22 08:11

Mihai Toader


The code between the curly brackets (Li, Lm, Lt, etc) are classes of Unicode characters. A quick google for Unicode character classes produces for example the following list: http://www.siao2.com/2005/04/23/411106.aspx

like image 25
Gerke Geurts Avatar answered Nov 14 '22 10:11

Gerke Geurts