Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Mysql encoding utf8 to utf8mb4 in Rails project

I have a Rails 3.2 project using Mysql 5.5.34, with utf8 encoding. Now I found that with utf8 encoding Mysql could not save unicode characters which represent emoji.

So is it OK for me to convert the whole database to use utf8mb4 encoding that I found on the web that could hold 4 byte unicode include emoji?

Is all the information I have in the database covered by utf8mb4 encoding? Will I face data loses if I do that?

Is there any way that Rails provide to do that?

Thanks a lot for helping.

like image 751
larryzhao Avatar asked Dec 09 '13 08:12

larryzhao


People also ask

Which is better utf8 or utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

What is the difference between utf8mb4 and utf8 charsets in MySQL?

utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .

What is charset utf8mb4?

utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead. utf8 : An alias for utf8mb3 .


1 Answers

Actually you just need to migrate the column you want to encode with utf8mb4.

execute("ALTER TABLE yourtablename MODIFY yourcolumnname TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;")

If you plan to migrate the data itself it might not be possible, since the common utf8 consists out of 3 byte chars and the utf8mb4 out of 4 byte. So you might already have corrupt data in your db.

Furthermore Rails 3.2 has an encoding issue within ActiveSupports JSON encoding. In case you plan to work with json and emojis, you will need to add a patch like the following (based on the solution in rails 4 https://github.com/rails/rails/blob/4-0-stable/activesupport/lib/active_support/json/encoding.rb) or just simply upgrade to rails 4.

module ActiveSupport
  module JSON
    module Encoding
      class << self
        def escape(string)
          if string.respond_to?(:force_encoding)
            string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
          end
          json = string.gsub(escape_regex) { |s| ESCAPED_CHARS[s] }
          json = %("#{json}")
          json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
          json
        end
      end
    end
  end
end
like image 100
schmierkov Avatar answered Sep 26 '22 07:09

schmierkov