Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write regexes for German character classes like letters, vowels, and consonants?

For example, I set up these:

L = /[a-z,A-Z,ßäüöÄÖÜ]/
V = /[äöüÄÖÜaeiouAEIOU]/
K = /[ßb-zBZ&&[^#{V}]]/

So that /(#{K}#{V}{2})/ matches "ᄚ" in "azAZᄚ".

Are there any better ways of dealing with them?

Could I put those constants in a module in a file somewhere in my Ruby installation folder, so I can include/require them inside any new script I write on my computer? (I'm a newbie and I know I'm muddling this terminology; Please correct me.)

Furthermore, could I get just the meta-characters \L, \V, and \K (or whatever isn't already set in Ruby) to stand for them in regexes, so I don't have to do that string interpolation thing all the time?

like image 203
Owen_AR Avatar asked Apr 19 '13 09:04

Owen_AR


1 Answers

You're starting pretty well, but you need to look through the Regexp class code that is installed by Ruby. There are tricks for writing patterns that build themselves using String interpolation. You write the bricks and let Ruby build the walls and house with normal String tricks, then turn the resulting strings into true Regexp instances for use in your code.

For instance:

LOWER_CASE_CHARS = 'a-z'
UPPER_CASE_CHARS = 'A-Z'
CHARS = LOWER_CASE_CHARS + UPPER_CASE_CHARS
DIGITS = '0-9'

CHARS_REGEX = /[#{ CHARS }]/
DIGITS_REGEX = /[#{ DIGITS }]/

WORDS = "#{ CHARS }#{ DIGITS }_"
WORDS_REGEX = /[#{ WORDS }]/

You keep building from small atomic characters and character classes and soon you'll have big regular expressions. Try pasting those one by one into IRB and you'll quickly get the hang of it.

like image 52
the Tin Man Avatar answered Nov 10 '22 12:11

the Tin Man