Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zlib and utf-8 in ruby

I'm trying to use zlib to compress out some lengthy strings, some of which may contain unicode characters. At the moment, I'm doing this in ruby, but I think this would apply across any language really. Here's the super basic implementation:

require 'zlib'

example = "“hello world”" # note the unicode quotes
compressed = Zlib.deflate(example)
puts Zlib.inflate(compressed)

The issue here is that the text comes out as this:

\xE2\x80\x9Chello world\xE2\x80\x9

...no unicode quotes, just weird unrecognizable characters. Does anyone know of a way that Zlib can be used while retaining unicode characters? Bonus points for an answer in ruby : )

like image 292
Jeff Escalante Avatar asked Nov 23 '13 04:11

Jeff Escalante


1 Answers

It seems Zlib produces ASCII-8BIT as the default encoding upon inflating. To fix it just force the original encoding:

require 'zlib'

input = "“hello world”" 
compressed = Zlib.deflate(input)
output = Zlib.inflate(compressed).force_encoding(input.encoding)

Or set the encoding manually:

output = Zlib.inflate(compressed).force_encoding('utf-8')
like image 64
Casper Avatar answered Nov 16 '22 07:11

Casper