Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's a good regex to include accented characters in a simple way?

Tags:

regex

Right now my regex is something like this:

[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - ' , to be included.

like image 303
Exn Avatar asked Jul 10 '14 12:07

Exn


People also ask

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What does ?: Mean in regex?

It means that it is not capturing group.


1 Answers

Accented Characters: DIY Character Range Subtraction

If your regex engine allows it (and many will), this will work:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$ 

Please see the demo (you can add characters to test).

Explanation

  • (?i) sets case-insensitive mode
  • The ^ anchor asserts that we are at the beginning of the string
  • (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...
  • The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets
  • [-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract
  • The + matches that one or more times
  • The $ anchor asserts that we are at the end of the string

Reference

Extended ASCII Table

like image 135
zx81 Avatar answered Sep 19 '22 04:09

zx81