Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incompatible character encoding in simple Sinatra app

I have a very simple Sinatra app running on Ruby 1.9.3 that uses ERB and markdown templates. I've stripped it right down to demonstrate the problem.

This is running Sinatra 1.3.2 on Mac OS X Snow Leopard. For the markdown I'm using rdiscount 1.6.8.

The main Ruby file contains

get '/services' do
  erb :services
end

The services.erb file has the following in it

<%= markdown :'content/service1' %>
£

Inside the markdown file I have just a single line

£

When I run the Sinatra app and load the 'services' page I get the exception Encoding::CompatibilityError at /services incompatible character encodings: UTF-8 and ASCII-8BIT on the second line of the ERB file (the one containing just the '£').

I've done lots of Googling and I can't for the life of me figure out why this is happening. The ERB and markdown files are UTF-8 on my local disk, but obviously they are being loaded by Sinatra and turned into strings, and I've no idea how to tell what encoding those strings are.

If I force Sinatra to use ASCII-8BIT (by adding settings.default_encoding = 'ASCII-8BIT' to the top of my main Sinatra Ruby file) then no exception is thrown but the '£' characters come out looking wrong.

Any pointers?

like image 822
Ben Avatar asked Apr 26 '12 21:04

Ben


1 Answers

This is an issue in Tilt, the templating system that Sinatra uses (and is being considered for Rails). Have a look at issues #75 and #107.

The problem is basically down to how Tilt reads template files from the disk - it uses binread. This means that the source string that is handed to the actual template engine has an associated encoding of ASCII-8BIT, which is basically saying that it’s unknown.

RDiscount has code to set the encoding of the output to match the input, but this isn’t much help when the input encoding is ASCII-8BIT; the result is given the same encoding. The same thing (or something similar) happens with Kramdown, so simply switching won’t solve this.

This causes problems when the template has non-ascii characters (i.e. £) and you try to combine the result with other utf-8 encoded strings. If the template only contains only ascii characters, it is compatible with utf-8 and Ruby can combine the two strings. If not, you get the CompatibilityError that you see.

A possible workaround is to read the template files yourself, and pass in the resulting string with the correct encoding to Tilt:

<%= markdown File.read './views/pound.md' %>
£

By reading the file yourself with read instead of binread, you can ensure it has the right encoding and so is compatible with the rest of the erb file. You may want to read the file in once and cache the contents somewhere if you try this.

An alternative workaround would be to capture the output of the markdown method and use force_encoding on it:

<%= markdown(:pound).force_encoding('utf-8') %>
£

This is possible because although the encoding is ASCII-8BIT, you know that the bytes in the string really are utf-8 encoded, so you can just change the encoding.

like image 181
matt Avatar answered Oct 13 '22 05:10

matt