Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String encoding issue in Ruby

Tags:

ruby

encoding

In ruby 1.9.3-p484 I have to construct an SMPP package, but when I pass the constructed packet's content in string to the method that delivers it, a strange \xC2 value appears in the content. Having investigated the issue, I found the following interesting gotcha:

"\u008E".force_encoding("BINARY")
 => "\xC2\x8E"

Why does \u00BE become \xC2\8E when I want to use binary encoding? Why not \x00\x8E?

like image 376
Nucc Avatar asked May 31 '26 09:05

Nucc


2 Answers

Because it is just forces text in binary encoding, and you have seen it as it is stored in memory. And it is stored in memory as an mbcs(Multi-Byte Character Set) data. And for chars over \x7F it become at leat two-bytes representation. So you can see:

"\u008E".force_encoding("BINARY")
# => "\xC2\x8E"
like image 187
Малъ Скрылевъ Avatar answered Jun 02 '26 00:06

Малъ Скрылевъ


this is a binary representation. Take a look:

At Tue, 27 Jul 2010 22:21:31 +0900, Heesob Park wrote in :

I noticed String#inspect results \x{XXXX} for the encoding other than Unicode.

Is there any possibility that \x{XXXX} is accepted as an escape sequence of string?

irb(main):004:0> a = "\xC7\xD1\xB1\xDB"

This is in binary representation.

irb(main):010:0> a1 => "\x{B1DB}"

https://bugs.ruby-lang.org/issues/3619

It's on a codepoint representation.

like image 22
Guilherme Avatar answered Jun 02 '26 02:06

Guilherme



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!