Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert text to unicode in Rails?

In my database, I have the following entry

id     |      name      |      info
1          John Smith         Çö ¿¬¼

As you can tell, the info column displays wrong -- it's actually Korean, though. In Chrome, when I switch the browser encoding from UTF-8 to Korean ('euc-kr', I think), I actually manage to view the text as such:

id     |      name      |      info
1          John Smith        횉철 쩔짭쩌

I then manually copy the text into the info in the database and save, and now I can view it in UTF-8, without switching my browser's encoding.

Awesome. Now I'd like to get that same thing done in Rails, not manually. So starting with the original entry again, I go to the console and type:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('euc-kr','UTF-8', info)
u.update_attribute('info', new_info)

However, what I end up with is something resembling \x{A2AF}\x{A8FA}\x{A1C6} \x{A2A5}\x{A8A2} in the database, not 횉철 쩔짭쩌.

I have a very basic understanding of unicode and encoding.

Can someone please explain what's going on here and how to get around that? The desired result is what I achieved manually.

Thanks!

like image 919
Yuval Karmi Avatar asked Jan 22 '26 03:01

Yuval Karmi


1 Answers

Wow. I'm beating myself over the head now. After hours of trying to resolve this, I finally figured it out myself a few minutes after I posted a question here.

The solution consists of three simple steps:

STEP 1:

I almost had it right. I shouldn't be converting from euc-kr to utf-8, but the other way around, as such:

Iconv.iconv('UTF-8', 'euc-kr', info)

STEP 2:

I might still run into some errors in the text, so to be safe I tell Iconv to ignore any errors:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info)

Finally, I actually get REAL KOREAN TEXT, yay! The problem is, when I try to insert it into the database, it's still inserting something along the lines of:

UPDATE `users` SET `info` = '--- \n- \"\\xEC\\xB2\\xA0\\xEC\\xB1\\x8C...' etc...

Even though it turns out I have the right text. So why is that? Onto the last step.

STEP 3:

Turns out the output from Iconv is an array. And so, we merge it with join:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info).join

And this actually works!

The final code:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('UTF-8//IGNORE','euc-kr', info).join
u.update_attribute('info', new_info)

Hope this helps whomever sees this (and knowing myself, probably future me).

like image 127
Yuval Karmi Avatar answered Jan 24 '26 21:01

Yuval Karmi