I have a website thats running on ruby 1.8.7 . I have a validation on an incoming post that checks to make sure that we allow upto max of 12000 characters. The spaces are counted as characters and tab and carriage returns are stripped off before the post is subjected to the validation.
Here is the post that is subjected to validation http://pastie.org/5047582
In ruby 1.9 the string length shows up as 11909 which is correct. But when I check the length on ruby 1.8.7 is turns out to be 12044.
I used codepad.org to run this ruby code which gives me http://codepad.org/OxgSuKGZ ( which outputs the length as 12044 which is wrong) but when i run this same code in the console at codeacademy.org the string length is 11909.
Can anybody explain me why this is happening ???
Thanks
This is a Unicode issue. The string you are using contains characters outside the ASCII range, and the UTF-8 encoding that is frequently used encodes those as 2 (or more) bytes.
Ruby 1.8 did not handle Unicode properly, and length
simply gives the number of bytes in the string, which results in fun stuff like:
"ą".length
=> 2
Ruby 1.9 has better Unicode handling. This includes length
returning the actual number of characters in the string, as long as Ruby knows the encoding:
"ä".length
=> 1
One possible workaround in Ruby 1.8 is using regular expressions, which can be made Unicode aware:
"ą".scan(/./mu).size
=> 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With