Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make Ruby 1.9 regard all source files to be UTF-8 encoded. (Even if recompiling the interpreter is necessary)

I want to port a rails app from Ruby 1.8.7 to 1.9.2. Some of the files contain umlauts like ä/ö/ü both within strings and comments. The files were saved as UTF-8 but without a BOM (byte order mark) at the beginning.

As you might know, Ruby 1.9 refuses to parse these files, giving an invalid multibyte char (US-ASCII)

I was googling and reading a lot but the only solution to this seems to be to

  • insert a BOM or
  • insert # coding: utf-8

at the beginning of each file.

My editor of choice (gEdit) doesn't seem to insert a BOM. I also read that having a BOM is bad practice because it may break some editors, it also breaks shell scripts if you want to use the shebang notation.

EDIT: The BOM breaks the Ruby 1.8.7 parser, giving a syntax error, unexpected kEND, expecting $end (SyntaxError) for the file!

I tried forcing the external encoding with ruby -Eutf-8:utf-8 but this seems to be ignored when calling rake (I tried: /home/malte/.rvm/gems/ruby-1.9.2-p180/bin/rake test).

So my question is:

As RVM is building ruby 1.9 from source anyway, is there a build option or a patch to change the default encoding from US-ASCII to UTF-8?

I took a quick look at the source code but couldn't find the line where the default is set (I'm no C expert, tough).

like image 333
Malte Avatar asked Mar 19 '11 03:03

Malte


1 Answers

I found a workaround: set the RUBYOPT environment variable, for example by executing

export RUBYOPT=-Ku

in your shell.

This will set -Ku als default option when calling ruby. You can now call all other tools which invoke ruby without worrying about parameters. rails server or rake works and regards all files as UTF-8. No BOM or magic comments necessary!

like image 110
Malte Avatar answered Oct 21 '22 18:10

Malte