How to change the CHARACTER SET (and COLLATION) throughout a database?

What do the parts of `utf8mb4_0900_ai_ci` mean?

3 bytes -- utf8
4 bytes -- utf8mb4 (new)

v4.0 --   _unicode_
v5.20 --  _unicode_520_
v9.0 --   _0900_ (new)

_bin      -- just compare the bits; don't consider case folding, accents, etc
_ci       -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á)
_ai_ci    -- explicitly case insensitive and accent insensitive
_as (etc) -- accent-sensitive (etc)

_bin         -- simple, fast
_general_ci  -- fails to compare multiletters; eg ss=ß, somewhat fast
...          -- slower
_0900_       -- (8.0) much faster because of a rewrite

More info:

What are the differences between utf8_general_ci and utf8_unicode_ci?
What's the difference between utf8_general_ci and utf8_unicode_ci?
How to change collation of database, table, column?
What's the difference between utf8_general_ci and utf8_unicode_ci?

Heres how to change all databases/tables/columns. Run these queries and they will output all of the subsequent queries necessary to convert your entire schema to utf8. Hope this helps!

-- Change DATABASE Default Collation

SELECT DISTINCT concat('ALTER DATABASE `', TABLE_SCHEMA, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like  'database_name';

-- Change TABLE Collation / Char Set

SELECT concat('ALTER TABLE `', TABLE_SCHEMA, '`.`', table_name, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like 'database_name';

-- Change COLUMN Collation / Char Set

SELECT concat('ALTER TABLE `', t1.TABLE_SCHEMA, '`.`', t1.table_name, '` MODIFY `', t1.column_name, '` ', t1.data_type , '(' , t1.CHARACTER_MAXIMUM_LENGTH , ')' , ' CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.columns t1
where t1.TABLE_SCHEMA like 'database_name' and t1.COLLATION_NAME = 'old_charset_name';

Beware that in Mysql, the utf8 character set is only a subset of the real UTF8 character set. In order to save one byte of storage, the Mysql team decided to store only three bytes of a UTF8 characters instead of the full four-bytes. That means that some east asian language and emoji aren't fully supported. To make sure you can store all UTF8 characters, use the utf8mb4 data type, and utf8mb4_bin or utf8mb4_general_ci in Mysql.

Adding to what David Whittaker posted, I have created a query that generates the complete table and columns alter statement that will convert each table. It may be a good idea to run

SET SESSION group_concat_max_len = 100000;

first to make sure your group concat doesn't go over the very small limit as seen here.

     SELECT a.table_name, concat('ALTER TABLE ', a.table_schema, '.', a.table_name, ' DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci, ',
        group_concat(distinct(concat(' MODIFY ',  column_name, ' ', column_type, ' CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ', if (is_nullable = 'NO', ' NOT', ''), ' NULL ',
        if (COLUMN_DEFAULT is not null, CONCAT(' DEFAULT \'', COLUMN_DEFAULT, '\''), ''), if (EXTRA != '', CONCAT(' ', EXTRA), '')))), ';') as alter_statement
    FROM information_schema.columns a
    INNER JOIN INFORMATION_SCHEMA.TABLES b ON a.TABLE_CATALOG = b.TABLE_CATALOG
        AND a.TABLE_SCHEMA = b.TABLE_SCHEMA
        AND a.TABLE_NAME = b.TABLE_NAME
        AND b.table_type != 'view'
    WHERE a.table_schema = ? and (collation_name = 'latin1_swedish_ci' or collation_name = 'utf8mb4_general_ci')
    GROUP BY table_name;

A difference here between the previous answer is it was using utf8 instead of ut8mb4 and using t1.data_type with t1.CHARACTER_MAXIMUM_LENGTH didn't work for enums. Also, my query excludes views since those will have to altered separately.

I simply used a Perl script to return all these alters as an array and iterated over them, fixed the columns that were too long (generally they were varchar(256) when the data generally only had 20 characters in them so that was an easy fix).

I found some data was corrupted when altering from latin1 -> utf8mb4. It appeared to be utf8 encoded latin1 characters in columns would get goofed in the conversion. I simply held data from the columns I knew was going to be an issue in memory from before and after the alter and compared them and generated update statements to fix the data.

here describes the process well. However, some of the characters that didn't fit in latin space are gone forever. UTF-8 is a SUPERSET of latin1. Not the reverse. Most will fit in single byte space, but any undefined ones will not (check a list of latin1 - not all 256 characters are defined, depending on mysql's latin1 definition)

Related questions
                            
                                How to recover MySQL database from .myd, .myi, .frm files
                            
                                Select records from NOW() -1 Day
                            
                                Throw an error preventing a table update in a MySQL trigger
                            
                                On duplicate key ignore? [duplicate]
                            
                                How to create a database from shell command?
                            
                                MySQL - UPDATE multiple rows with different values in one query
                            
                                How to get ER model of database from server with Workbench
                            
                                #1273 - Unknown collation: 'utf8mb4_unicode_ci' cPanel
                            
                                Maximum number of records in a MySQL database table
                            
                                How to find out the MySQL root password
                            
                                MySQL: Sort GROUP_CONCAT values
                            
                                MySQL Great Circle Distance (Haversine formula)
                            
                                Setting Django up to use MySQL
                            
                                Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]
                            
                                Copy values from one column to another in the same table
                            
                                How to grant all privileges to root user in MySQL 8.0
                            
                                Best practice multi language website
                            
                                Detect if value is number in MySQL
                            
                                Binary Data in MySQL [closed]
                            
                                Is there a REAL performance difference between INT and VARCHAR primary keys?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to change the CHARACTER SET (and COLLATION) throughout a database?

Tags:

sql

mysql

collation

People also ask

What do the parts of `utf8mb4_0900_ai_ci` mean?

Recent Activity

Donate For Us

How to change the CHARACTER SET (and COLLATION) throughout a database?

Tags:

sql

mysql

collation

People also ask

What do the parts of utf8mb4_0900_ai_ci mean?

Related questions

Recent Activity

Donate For Us

What do the parts of `utf8mb4_0900_ai_ci` mean?