Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode Chinese hex string into Chinese characters or JavaScript?

I am working on a Rails app.

I am using an API that returns some Chinese provinces. The API returns the provinces in hex strings, for example:

{ "\xE5\x8C\x97\xE4\xBA\xAC" => "some data" }

My JavaScript calls a controller that returns this hash. I put all the province strings into a dropdown but the strings show up as a black diamond with a question mark in the middle. I am wondering how do I convert the Ruby hex string into actual Chinese characters, 北京? Or if possible, can I convert the hex string in JavaScript into Chinese characters?

like image 539
gruuuvy Avatar asked Sep 30 '22 08:09

gruuuvy


1 Answers

The bytes \xE5\x8C\x97 are the UTF-8 representation of and \xE4\xBA\xAC is the UTF-8 representation of . So this string:

"\xE5\x8C\x97\xE4\xBA\xAC"

is 北京 if the bytes are interpreted as UTF-8. That you're seeing hex codes instead of Chinese characters suggests that the string's encoding is binary:

> s = "\xE5\x8C\x97\xE4\xBA\xAC"
 => "北京" 
> s.encoding
 => #<Encoding:UTF-8> 
> s.force_encoding('binary')
 => "\xE5\x8C\x97\xE4\xBA\xAC"

So this API you're talking to is speaking UTF-8 but somewhere your application is losing track of what encoding that string is supposed to be. If you force the encoding to be UTF-8 then the problem goes away:

> s.force_encoding('utf-8')
 => "北京" 

You should fix this encoding problem at the very edge of your application where it reads data from this remote API. Once that's done, everything should be sensible UTF-8 everywhere that you care about. This should fix your JavaScript problem as well as JavaScript is quite happy to work with UTF-8.

like image 109
mu is too short Avatar answered Oct 03 '22 00:10

mu is too short