Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails encoding in ASCII-8BIT

I know this have been asked several times, but to me is happening something strange:

I have an index view where rendering certain characters (letters with accent) causes Rails to raise the exception

incompatible character encodings: ASCII-8BIT and UTF-8

so i checked my strings encoding and this is actually ASCII-8BIT everywhere, even though i set the proper encoding to UTF-8 in my application.rb

config.encoding = "utf-8"

and in my enviroment.rb

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

and in my database it appear:

character_set_database = utf-8

as suggestend in some guides.

Strings are inserted with a textarea field and are not concatenated to any other already inserted string.

The strange things are:

  • this happens only in the index view, whereas this is not happening in the show (same resource)
  • this happens only for this model (which is an email, with subject and body, but this shouldn't affect anything)
  • In my development environment everything goes well setting str.force_encoding('utf-8'), whereas in my production environment this is not working. (dev i'm with Ruby 2.0.0, in production Ruby 2.1.0, both Rails4, and both MySql)
  • setting the file view with # encoding utf-8 also doesn't work
  • trying str.force_encoding('ascii-8bit').encode('utf-8') says Encoding::UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8 which is an à, while using body.force_encoding('ascii-8bit').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?'), replaces all accented charaters with a ?, while str.force_encoding('iso-8859-1').encode('utf-8') obviously generates the wrong character (a ?).

So my questions are 2: - why is rails setting the string encodint to ascii-8bit? - how to solve this issue?

I've already checked these questions (the newest ones with rails4):

Rails View Encoding Issues

"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8

How to convert a string to UTF8 in Ruby

Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8

and other resources also, but nothing worked.

like image 764
sissy Avatar asked Nov 23 '22 02:11

sissy


1 Answers

You probably have a string literal in your source code somewhere that you then concatenate another string too. For instance:

some_string = "this is a string"

or even

some_string = "" #empty string

Those strings, stored in some_string, will be marked ASCII_8BIT, and if you then later do something like:

some_string = some_string + unicode_string

Then you'll get the error. That is, those strings will be marked ASCII-8BIT unless you add, to the top of the file where the string literals are created:

#encoding: utf-8

That declaration determines the default encoding that string literals in source code will have.

I am just guessing, because this pattern is a common source of this problem. To know more for sure, it would take more information than is in your question -- it would take debugging the actual source code, to figure out exactly what string is tagged as ASCII-8BIT when you expect it to be tagged UTF-8 instead, and exactly where that String came from.

like image 126
jrochkind Avatar answered Dec 02 '22 12:12

jrochkind