I'm using ruby 1.9 and trying to find out which regex I need to make this true:
Encoding.default_internal = Encoding.default_external = 'utf-8'
"föö".match(/(\w+)/u)[1] == "föö"
# => false
You can manually turn on Unicode matching using the inside (?u)
syntax:
"föö".match(/(?u)(\w+)/)[1] == "föö"
# => true
However, using Unicode Property Syntax (steenslag's answer) or POSIX Brackets Syntax is better style, since they both automatically respect Unicode codepoints:
"föö".match(/(\p{word}+)/)[1] == "föö"
# => true
"föö".match(/([[:word:]]+)/)[1] == "föö"
# => true
See this blog post for more info about matching Unicode characters in Ruby regexes.
# encoding=utf-8
p "föö".match(/\p{Word}+/)[0] == "föö"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With