I have problem with one of my validation regex when using nonstandard utf-8 character. So, I run a few experiments and it appears that ruby regex behave different when there are with rails environment or in plain ruby.
I post here my expriment with a Chinese string.
In ruby "pure" :
string = "運動會"
puts string[/\A[\w]*\z/]
=> match "運動會" - ok
In rails :
# coding: utf-8
task :test => :environment do
string = "運動會"
puts string[/\A[\w]*\z/]
end
$ rake test
=> nothing - not ok
If I omit # coding: utf-8
, it comes with invalid multibyte char (US-ASCII)
. Anyway, even with this, it doesn't match.
Of course, I have checked everything (ruby_version, encoding of script files in utf-8..)
I use :
So my conclusion is that rails alter the way regex behave and I did not find a way to make it behaves like in normal ruby.
Ok, I found an answer to my problem. The \w
behaves only with ascii character in ruby 1.9 against all unicode caracter in ruby 1.8. In ruby 1.9, now we have to use : [\w\P{ASCII}]
More infos : http://www.ruby-forum.com/topic/210770
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With