Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ActiveRecord returns data in ASCII-8Bit Under Ruby 1.9.2-rc1

Further to the title, when loading data from ActiveRecord the encoding is always set to ASCII-8Bit in spite of my best efforts to force the encoding. I have entered as much detail as possible here to try and build a good error report someone could use to help me out!

The project is using the following technologies:

  • Padrino Framework
  • Ruby 1.9.2-rc2 (Also 1.9.1 and 1.9.2-preview3)
  • ActiveRecord
  • MySQL

(Full List)

$ bundle show | ack '(record|padrino)'
  * activerecord (2.3.8)
  * padrino (0.9.14)
  * padrino-admin (0.9.14)
  * padrino-core (0.9.14)
  * padrino-gen (0.9.14)
  * padrino-helpers (0.9.14)
  * padrino-mailer (0.9.14)

Episodes Table:

mysql> DESCRIBE `episodes`;
+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int(11)      | NO   | PRI | NULL    | auto_increment |
| show_id        | int(11)      | YES  |     | NULL    |                |
| season_id      | int(11)      | YES  |     | NULL    |                |
| episode_number | int(11)      | YES  |     | NULL    |                |
| title          | varchar(255) | YES  |     | NULL    |                |
| year           | int(11)      | YES  |     | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+
6 rows in set (0.02 sec)

mysql> SHOW CREATE TABLE episodes;
       Table: episodes
Create Table: CREATE TABLE `episodes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `show_id` int(11) DEFAULT NULL,
  `season_id` int(11) DEFAULT NULL,
  `episode_number` int(11) DEFAULT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `year` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=74332 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

mysql> SHOW CREATE DATABASE development;
+-------------+--------------------------------------------------------------------------------------------------------+
| Database    | Create Database                                                                                        |
+-------------+--------------------------------------------------------------------------------------------------------+
| development | CREATE DATABASE `development` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */           |
+-------------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

As you can see the database certainly thinks that things should be UTF-8; and the database adapter too:

ActiveRecord::Base.configurations[:development] = {
  :adapter   => 'mysql',
  :encoding  => 'utf8',
  :reconnect => false,
  :database  => "development",
  :pool      => 5,
  :username  => 'root',
  :password  => '',
  :host      => 'localhost',
}

That is echoed in the console when examining the active record connection:

ruby-1.9.2-rc1 > ActiveRecord::Base.connection
  DEBUG - [06/Jul/2010 19:24:32] "SQL (0.1ms)   SET NAMES 'utf8'"
  DEBUG - [06/Jul/2010 19:24:32] "SQL (0.1ms)   SET SQL_AUTO_IS_NULL=0"
 => #<ActiveRecord::ConnectionAdapters::MysqlAdapter:0x0000010936fa88 @logger=#<Padrino::Logger:0x00000101587198 @buffer=[], @auto_flush=true, @level=0, @log=#<IO:<STDOUT>>, @mutex=#<Mutex:0x00000101587148>, @format_datetime="%d/%b/%Y %H:%M:%S", @format_message="%s - [%s] \"%s\"">, @connection=#<Mysql:0x0000010936fad8>, @runtime=0.2608299255371094, @last_verification=0, @query_cache_enabled=false, @config={:adapter=>"mysql", :encoding=>"utf8", :reconnect=>false, :database=>"development", :pool=>5, :username=>"root", :password=>"", :host=>"localhost"}, @connection_options=["localhost", "root", "", "development", nil, nil, 131072], @quoted_table_names={}, @quoted_column_names={}> 

ruby-1.9.2-rc1 > ActiveRecord::Base.connection.encoding

Ruby should know the language, here's my $ locale

LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.utf-8"
LC_CTYPE="en_GB.utf-8"
LC_MESSAGES="en_GB.utf-8"
LC_MONETARY="en_GB.utf-8"
LC_NUMERIC="en_GB.utf-8"
LC_TIME="en_GB.utf-8"
LC_ALL=

Although Ruby is not setting Encoding.default_internal:

$ irb --simple-prompt
ruby-1.9.2-rc1 > Encoding.default_internal
 => nil 

I have added a snippet in my application's config/boot.rb that looks like this:

if Kernel.const_defined?("Encoding") and Encoding.respond_to?(:find) and Encoding.respond_to?(:default_internal)
  Encoding.default_internal = Encoding.find('UTF-8')
end

That works exactly as you might expect… but is a hack, and doesn't solve the problem.

And here's the output of the problem in situ:

ruby-1.9.2-rc1 > e = Episode.new
  DEBUG - [06/Jul/2010 19:29:14] "SQL (0.1ms)   SET NAMES 'utf8'"
  DEBUG - [06/Jul/2010 19:29:14] "SQL (0.1ms)   SET SQL_AUTO_IS_NULL=0"
  DEBUG - [06/Jul/2010 19:29:14] "Episode Columns (0.8ms)   SHOW FIELDS FROM `episodes`"
 => #<Episode id: nil, show_id: nil, season_id: nil, episode_number: nil, title: nil, year: nil> 
ruby-1.9.2-rc1 > e.title
 => nil
ruby-1.9.2-rc1 > nt = "New Title"
 => "New Title" 
ruby-1.9.2-rc1 > nt.encoding
 => #<Encoding:UTF-8> 
ruby-1.9.2-rc1 > e.title = nt
 => "New Title" 
ruby-1.9.2-rc1 > e.title.encoding
 => #<Encoding:UTF-8> 
ruby-1.9.2-rc1 > e.save
  DEBUG - [06/Jul/2010 19:29:48] "SQL (0.1ms)   BEGIN"
  DEBUG - [06/Jul/2010 19:29:48] "Episode Create (0.2ms)   INSERT INTO `episodes` (`show_id`, `season_id`, `episode_number`, `title`, `year`) VALUES(NULL, NULL, NULL, 'New Title', NULL)"
  DEBUG - [06/Jul/2010 19:29:48] "SQL (0.4ms)   COMMIT"
 => true 
ruby-1.9.2-rc1 > Episode.find_by_title(nt).title.encoding
  DEBUG - [06/Jul/2010 19:30:04] "Episode Load (29.5ms)   SELECT * FROM `episodes` WHERE (`episodes`.`title` = 'New Title') LIMIT 1"
 => #<Encoding:ASCII-8BIT> 
ruby-1.9.2-rc1 > 

I had some success by overriding the accessors, and redefining them as:

class Episode 
  # ...
  def title
    title.encode!
  end
  # ...
end

Where encode! is defined here in the API docs for 1.9 - to quote it here "with no options returns a copy of str transcoded to Encoding.default_internal."

Whilst my work-arounds are successful, I would much prefer to have UTF-8 coming out of the database, which is what my code seems to indicate to be the case.

like image 889
Lee Hambley Avatar asked Jul 06 '10 17:07

Lee Hambley


1 Answers

You probably need the ruby-mysql gem, which is encoding aware in 1.9, instead of the more common mysql gem, which isn't. See my blog for details.

like image 99
Ralph von der Heyden Avatar answered Sep 26 '22 14:09

Ralph von der Heyden