I am just starting to learn Ruby (to eventually move to RoR), but I was just told that Ruby does not support unicode. Is it true? How do Ruby programmers go about supporting unicode?
Ruby has support for Unicode, it's enabled by default since Ruby 1.9.
In Ruby, texts are encoded in UTF-8 by default. This is because UTF-8 is a multi-byte character encoding that allows a single character to take up between 1 and 4 bytes. Other encodings, such as UTF-7, UCS-2, UTF-16, etc., are also present.
Unicode Characters The Unicode Standard provides a unique number for every character, no matter what platform, device, application or language. It has been adopted by all modern software providers and now allows data to be transported through many different platforms, devices and applications without corruption.
To change the original you may use String#encode! . With the use of x. encode("UTF-16"). bytes you can see the difference in the bytes from standard UTF-8 bytes.
What you heard is outdated and applies (only partially) to Ruby 1.8 or before. The latest stable version of Ruby (1.9), supports no less than 95 different character encodings (counted on my system just now). This includes pretty much all known Unicode Transformation Formats, including UTF-8.
The previous stable version of Ruby (1.8) has partial support for UTF-8.
If you use Rails, it takes care of default UTF-8 encoding for you. If all you need is UTF-8 encoding awareness, Rails will work for you no matter if you run Ruby 1.9 or Ruby 1.8. If you have very specific character encoding requirements, you should aim for Ruby 1.9.
If you're really interested, here is a series of articles describing the encoding issues in Ruby 1.8 and how they were worked around, and eventually solved in Ruby 1.9. Rails still includes workarounds for many common flaws in Ruby 1.8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With