Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

need a JavaScript Regex that requires upper or lowercase letters

I have a regex that right now only allows lowercase letters, I need one that requires either lowercase or uppercase letters:

/(?=.*[a-z])/
like image 370
Amen Avatar asked Feb 02 '11 15:02

Amen


3 Answers

You Can’t Get There from Here

I have a regex that right now only allows lowercase letters, I need one that requires either lowercase or uppercase letters: /(?=.*[a-z])/

Unfortunately, it is utterly impossible to do this correctly using Javascript! Read this flavor comparison’s ECMA column for all of what Javascript cannot do.

Theory vs Practice

The proper pattern for lowercase is the standard Unicode derived binary property \p{Lowercase}, and the proper pattern for uppercase is similarly \p{Uppercase}. These are normative properties that sometimes include non-letters in them under certain exotic circumstances.

Using just General Category properties, you can have \p{Ll} for Lowercase_Letter, \p{Lu} for Uppercase_Letter, and \p{Lt} for titlecase letter. Remember they are three cases in Unicode, not two). There is a standard alias \p{LC} which means [\p{Lu}\p{Lt}\p{Ll}].

If you want a letter than is not a lowercase letter, you could use (?=\P{Ll})\pL. Written in longhand that’s (?=\P{Lowercase_Letter})\p{Letter}. Again, these mix some of the Other_Lowercase code points that \p{Lowercase} recognizes. I must again stress that the Lowercase property is a superset of the Lowercase_Letter property.

Remember the previous paragraph, swapping in upper everywhere I have written lower, and you get the same thing for the capitals.

Possible Platforms

Because access to these essential properties is the minimal level of critical functionality necessary for Unicode regular expressions, some versions of Javascript implement them in just the way I have written them above. However, the standard for Javascript still does not require them, so you cannot in general count on them. This means that it is impossible to this correctly under all implementations of Javascript.

Languages in which it is possible to do what you want done minimally include:

  • C♯ and Java (both only General Categories)
  • Ruby if and only if v1.9 or better (only binary properties, including General Categories)
  • PHP and PCRE (only General Category and Script properties plus a couple extras)
  • ICU’s C++ library and Perl, which both support all Unicode properties

Of those listed bove, only the last line’s — ICU and Perl — strictly and completely meet all Level 1 compliance requirements (plus some Levels 2 and 3) for the proper handling of Unicode in regexes. However, all of those I’ve listed in the previous paragraph’s bullets can easily handle most, and quite probably all, of what you need.

Javascript is not amongst those, however. Your version might, though, if you are very lucky and never have to run on a standard-only Javascript platform.

Summary

So very sadly, you cannot really use Javascript regexes for Unicode work unless you have a non-standard extension. Some people do, but most do not. If you do not, you may have to use a different platform until the relevant ECMA standard catches up with the 21st century (Unicode 3.1 came out a decade ago!!).

If anyone knows of a Javascript library that implements the Level 1 requirements of UTS#18 on Unicode Regular Expressions including both RL1.2 “Properties” and RL1.2a “Annex C: Compatibility Properties”, please chime in.

like image 154
tchrist Avatar answered Nov 15 '22 06:11

tchrist


Not sure if you mean mixed-case, or strictly lowercase plus strictly uppercase.

Here's the mixed-case version:

/^[a-zA-Z]+$/

And the strictly one-or-the-other version:

/^([a-z]+|[A-Z]+)$/
like image 29
Platinum Azure Avatar answered Nov 15 '22 05:11

Platinum Azure


Try /(?=.*[a-z])/i

Note the i at the end, this makes the expression case insensitive.

like image 28
Leigh Avatar answered Nov 15 '22 07:11

Leigh