Convert a unicode string to characters in Ruby?

Tags:

1 Answers

You seem to have got your encodings into a bit of a mix up. If you haven’t already, you should first read Joel Spolsky’s article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) which provides a good introduction into this type of thing. There is a good set of articles on how Ruby handles character encodings at http://graysoftinc.com/character-encodings/understanding-m17n-multilingualization. You could also have a look at the Ruby docs for String and Encoding.

In this specific case, the string l\u0092issue means that the second character is the character with the unicode codepoint 0x92. This codepoint is PRIVATE USE TWO (see the chart), which basically means this position isn’t used.

However, looking at the Windows CP-1252 encoding, position 0x92 is occupied by the character ’, so if this is the missing character the the string would be l’issue, whick looks a lot more likely even though I don’t speak French.

What I suspect has happened is your program has received the string l’issue encoded in CP-1252, but has assumed it was encoded in ISO-8859-1 (ISO-8859-1 and CP-1252 are quite closely related) and re-encoded it to UTF-8 leaving you with the string you now have.

The real fix for you is to be careful about the encodings of any strings that enter (and leave) your program, and how you manage them.

To transform your string to l’issue, you can encode it back to ISO-8859-1, then use force_encoding to tell Ruby the real encoding of CP-1252, and then you can re-encode to UTF-8:

2.1.0 :001 > s = "l\u0092issue"
 => "l\u0092issue" 
2.1.0 :002 > s = s.encode('iso-8859-1')
 => "l\x92issue" 
2.1.0 :003 > s.force_encoding('cp1252')
 => "l\x92issue" 
2.1.0 :004 > s.encode('utf-8')
 => "l’issue"

This is only really a demonstration of what is going on though. The real solution is to make sure you’re handling encodings correctly.

157

answered Oct 05 '22 05:10

matt

Related questions
                            
                                How to split an array?
                            
                                C/C++ within Ruby code?
                            
                                Regex that matches valid Ruby local variable names
                            
                                Ruby & Syslog & custom facility
                            
                                Is there a usecase for nested classes?
                            
                                Ohm & Redis: when to use set, list or collection?
                            
                                Ruby Regex to round trailing zeros
                            
                                strange behaviour when comparing floating points in rspec
                            
                                Can the Ruby language be used to build operating systems?
                            
                                Find all subsets of size N in an array using Ruby
                            
                                Intersection of lists
                            
                                How to handle OmniAuth callbacks in multiple environments?
                            
                                how do I test (rspec) a http request that takes too long?
                            
                                Model scopes are breaking rake db:migrate - rails 3.2.3 postgres 9.1.3
                            
                                Cycle through elements of an array
                            
                                Why does string replace modifies the original variable value?
                            
                                How to get the EventMachine gem to compile on OSX Lion 10.8.2 with Xcode 4.5.1
                            
                                Using Ruby 2.0 on Amazon OpsWorks
                            
                                what is equivalent of =~ of ruby in php?
                            
                                Extracting URLs from a String that do not contain 'http'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert a unicode string to characters in Ruby?

Tags:

string

ruby

Bolo

People also ask

1 Answers

matt

Recent Activity

Donate For Us