I want to port a rails app from Ruby 1.8.7 to 1.9.2. Some of the files contain umlauts like ä/ö/ü both within strings and comments. The files were saved as UTF-8 but without a BOM (byte order mark) at the beginning.
As you might know, Ruby 1.9 refuses to parse these files, giving an invalid multibyte char (US-ASCII)
I was googling and reading a lot but the only solution to this seems to be to
# coding: utf-8
at the beginning of each file.
My editor of choice (gEdit) doesn't seem to insert a BOM. I also read that having a BOM is bad practice because it may break some editors, it also breaks shell scripts if you want to use the shebang notation.
EDIT: The BOM breaks the Ruby 1.8.7 parser, giving a syntax error, unexpected kEND, expecting $end (SyntaxError)
for the file!
I tried forcing the external encoding with ruby -Eutf-8:utf-8 but this seems to be ignored when calling rake (I tried: /home/malte/.rvm/gems/ruby-1.9.2-p180/bin/rake test).
So my question is:
As RVM is building ruby 1.9 from source anyway, is there a build option or a patch to change the default encoding from US-ASCII to UTF-8?
I took a quick look at the source code but couldn't find the line where the default is set (I'm no C expert, tough).
I found a workaround:
set the RUBYOPT
environment variable, for example by executing
export RUBYOPT=-Ku
in your shell.
This will set -Ku als default option when calling ruby. You can now call all other tools which invoke ruby without worrying about parameters. rails server
or rake
works and regards all files as UTF-8. No BOM or magic comments necessary!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With