I have a Rails 3.2 project using Mysql 5.5.34
, with utf8 encoding. Now I found that with utf8
encoding Mysql could not save unicode characters which represent emoji.
So is it OK for me to convert the whole database to use utf8mb4
encoding that I found on the web that could hold 4 byte unicode include emoji?
Is all the information I have in the database covered by utf8mb4
encoding? Will I face data loses if I do that?
Is there any way that Rails provide to do that?
Thanks a lot for helping.
The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead. utf8 : An alias for utf8mb3 .
Actually you just need to migrate the column you want to encode with utf8mb4.
execute("ALTER TABLE yourtablename MODIFY yourcolumnname TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;")
If you plan to migrate the data itself it might not be possible, since the common utf8 consists out of 3 byte chars and the utf8mb4 out of 4 byte. So you might already have corrupt data in your db.
Furthermore Rails 3.2 has an encoding issue within ActiveSupports JSON encoding. In case you plan to work with json and emojis, you will need to add a patch like the following (based on the solution in rails 4 https://github.com/rails/rails/blob/4-0-stable/activesupport/lib/active_support/json/encoding.rb) or just simply upgrade to rails 4.
module ActiveSupport
module JSON
module Encoding
class << self
def escape(string)
if string.respond_to?(:force_encoding)
string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
end
json = string.gsub(escape_regex) { |s| ESCAPED_CHARS[s] }
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8) if json.respond_to?(:force_encoding)
json
end
end
end
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With