Zlib and utf-8 in ruby

Question

I'm trying to use zlib to compress out some lengthy strings, some of which may contain unicode characters. At the moment, I'm doing this in ruby, but I think this would apply across any language really. Here's the super basic implementation:

require 'zlib'

example = "“hello world”" # note the unicode quotes
compressed = Zlib.deflate(example)
puts Zlib.inflate(compressed)

The issue here is that the text comes out as this:

\xE2\x80\x9Chello world\xE2\x80\x9

...no unicode quotes, just weird unrecognizable characters. Does anyone know of a way that Zlib can be used while retaining unicode characters? Bonus points for an answer in ruby : )

Casper · Accepted Answer

It seems Zlib produces ASCII-8BIT as the default encoding upon inflating. To fix it just force the original encoding:

require 'zlib'

input = "“hello world”" 
compressed = Zlib.deflate(input)
output = Zlib.inflate(compressed).force_encoding(input.encoding)

Or set the encoding manually:

output = Zlib.inflate(compressed).force_encoding('utf-8')

Zlib and utf-8 in ruby

Tags:

ruby

unicode

utf-8

zlib

Jeff Escalante

1 Answers

Casper

Recent Activity

Donate For Us

Zlib and utf-8 in ruby

Tags:

ruby

unicode

utf-8

zlib

Jeff Escalante

1 Answers

Casper

Related questions

Recent Activity

Donate For Us