I know this have been asked several times, but to me is happening something strange:
I have an index view where rendering certain characters (letters with accent) causes Rails to raise the exception
incompatible character encodings: ASCII-8BIT and UTF-8
so i checked my strings encoding and this is actually ASCII-8BIT everywhere, even though i set the proper encoding to UTF-8 in my application.rb
config.encoding = "utf-8"
and in my enviroment.rb
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
and in my database it appear:
character_set_database = utf-8
as suggestend in some guides.
Strings are inserted with a textarea field and are not concatenated to any other already inserted string.
The strange things are:
str.force_encoding('utf-8')
, whereas in my production environment this is not working. (dev i'm with Ruby 2.0.0, in production Ruby 2.1.0, both Rails4, and both MySql)# encoding utf-8
also doesn't workstr.force_encoding('ascii-8bit').encode('utf-8')
says Encoding::UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8
which is an à, while using body.force_encoding('ascii-8bit').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?')
, replaces all accented charaters with a ?, while str.force_encoding('iso-8859-1').encode('utf-8')
obviously generates the wrong character (a ?
).So my questions are 2: - why is rails setting the string encodint to ascii-8bit? - how to solve this issue?
I've already checked these questions (the newest ones with rails4):
Rails View Encoding Issues
"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8
How to convert a string to UTF8 in Ruby
Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8
and other resources also, but nothing worked.
You probably have a string literal in your source code somewhere that you then concatenate another string too. For instance:
some_string = "this is a string"
or even
some_string = "" #empty string
Those strings, stored in some_string
, will be marked ASCII_8BIT, and if you then later do something like:
some_string = some_string + unicode_string
Then you'll get the error. That is, those strings will be marked ASCII-8BIT unless you add, to the top of the file where the string literals are created:
#encoding: utf-8
That declaration determines the default encoding that string literals in source code will have.
I am just guessing, because this pattern is a common source of this problem. To know more for sure, it would take more information than is in your question -- it would take debugging the actual source code, to figure out exactly what string is tagged as ASCII-8BIT when you expect it to be tagged UTF-8 instead, and exactly where that String came from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With