Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Greek characters, Regular Expressions, and C#

I'm building a CMS for a scientific journal and that uses a lot of Greek characters. I need to validate a field to include a specific character set and Greek characters. Here's what I have now:

[^a-zA-Z0-9-()/\s]

How do I get this to include Greek characters in addition to alphanumeric, '(', ')', '-', and '_'?

I'm using C#, by the way.

like image 691
craigmoliver Avatar asked Mar 23 '10 17:03

craigmoliver


3 Answers

In .NET languages, you can use \p{IsGreekandCoptic} to match Greek characters. So the resulting regex is

[^a-zA-Z0-9-()/\s\p{IsGreekandCoptic}]

\p{IsGreekandCoptic} matches:

These characters will be matched by \p{IsGreekandCoptic} http://img203.imageshack.us/img203/3760/greekcoptic.png

like image 171
Tim Pietzcker Avatar answered Oct 09 '22 15:10

Tim Pietzcker


If you're using a language that uses PCRE for regular expressions and UTF-8, /[\x{0374}-\x{03FF}]+/u should match Greek characters. Greek characters fall between U+0374 and U+03FF (source), and the u modifier tells PCRE to use unicode. As commented below, /\p{Greek}+/u works as well with PCRE.

If you're using Javascript, it uses \uXXXX instead of \x{XXXX}: /[\u0374-\u03FF]+/.

Also see this guide to Unicode Regular Expressions for more information.

like image 5
Daniel Vandersluis Avatar answered Oct 09 '22 16:10

Daniel Vandersluis


For Java, from the Pattern javadoc:

\p{InGreek} A character in the Greek block (simple block)

like image 1
bmargulies Avatar answered Oct 09 '22 15:10

bmargulies