Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to catch letters beyond a-z

Tags:

c#

regex

A normal regexp to allow letters only would be "[a-zA-Z]" but I'm from, Sweden so I would have to change that into "[a-zåäöA-ZÅÄÖ]". But suppose I don't know what letters are used in the alphabet.

Is there a way to automatically know what chars are are valid in a given locale/language or should I just make a blacklist of chars that I (think I) know I don't want?

like image 949
Nifle Avatar asked Mar 17 '09 21:03

Nifle


People also ask

How do you match letters in regex?

Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.

What is ?! In regex?

It's a negative lookahead, which means that for the expression to match, the part within (?!...) must not match. In this case the regex matches http:// only when it is not followed by the current host name (roughly, see Thilo's comment).

What does the following regular expression match /[ A za Z ][ a za z *?

The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.

What does a za z ]+ mean?

[A-Za-z]+ . In words we could read the regular expression as "one or more occurences of the characters between the brackets literally followed by an @-sign, followed by one or more characters between the brackets, literally followed by a period, and completed by one or more letters from among A-Z and a-z.


2 Answers

You can use \pL to match any 'letter', which will support all letters in all languages. You can narrow it down to specific languages using 'named blocks'. More information can be found on the Character Classes documentation on MSDN.

My recommendation would be to put the regular expression (or at least the "letter" part) into a localised resource, which you can then pull out based on the current locale and form into the larger pattern.

like image 138
Richard Szalay Avatar answered Sep 23 '22 05:09

Richard Szalay


What about \p{name} ?

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

I don't know enough about unicode, but maybe your characters fit a unicode class?

like image 30
Ray Avatar answered Sep 23 '22 05:09

Ray