The MySQL database used by my Rails application currently has the default collation of latin1_swedish_ci
. Since the default charset of Rails applications (including mine) is UTF-8, it seems sensible to me to use the utf8_general_ci
collation in the database.
Is my thinking correct?
Assuming it is, what would be the best approach to migrate the collation and all the data in the database to the new encoding?
UTF-8, as well as any other Unicode encoding scheme, can store characters in any language, so it is an excellent choice of codepage for your database.
The collation setting, on the other hand, is a completely separate issue from the encoding scheme. It involves sort orders, upper/lowercase conversions, string equality comparisons, and things like that which are language-specific. The collation setting should match the language that is used in the database.
The UTF-8 general collation is (I am assuming here—I'm not familiar with MySQL in particular) used for situations where the language is unknown and some simple default ordering is needed. It probably corresponds to the Unicode code point ordering, which is almost certainly not what you want if you're storing Swedish.
Convert to UTF-8 as the charset.
Collation settings are only used for sorting and stuff like that. Choose the collation that most of your users would expect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With