Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert column collation to table/database default

Tags:

database

mysql

Every single post on SO that I have seen to accomplish this suggests running the following SQL:

ALTER TABLE <tablename> CONVERT TO CHARACTER SET utf8  COLLATE utf8_unicode_ci;

The problem with this, unless I am mistaken, is that it explicitly specifies the column collations, so you end up with something like this when you mysqldump the database:

  `address` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
  `city` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `state` varchar(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `zipcode` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,

My question is.. is there no way to convert the column collations to the table or database default without doing this?

For example, I have tables that might look like this:

  `address` varchar(150) DEFAULT NULL,
  `city` varchar(100) DEFAULT NULL,
  `state` varchar(2) COLLATE utf8_general_ci DEFAULT NULL,
  `zipcode` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,

What I want, is to convert all columns to utf8_unicode_ci (the table/database default), but not have each column explicitly set to that collation, so that when I mysqldump the converted table, it just looks like this:

  `address` varchar(150) DEFAULT NULL,
  `city` varchar(100) DEFAULT NULL,
  `state` varchar(2) DEFAULT NULL,
  `zipcode` varchar(10) DEFAULT NULL,

with a line at the end of the table creation statement that defines the default character set and collation: ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

like image 469
bruchowski Avatar asked Jun 15 '15 17:06

bruchowski


People also ask

How do I change the default database collation?

Set or change the database collation using SSMS If you are creating a new database, right-click Databases and then select New Database. If you don't want the default collation, select the Options page, and select a collation from the Collation drop-down list.

What is default database collation?

Default server-level collation is SQL_Latin1_General_CP1_CI_AS.

How do I find my default database collation?

To view a collation setting for a column in Object ExplorerExpand Databases, expand the database and then expand Tables. Expand the table that contains the column and then expand Columns. Right-click the column and select Properties. If the collation property is empty, the column is not a character data type.

Is SQL_Latin1_General_CP1_CI_AS the same as Latin1_General_CI_AS?

The SQL_Latin1_General_CP1_CI_AS collation is a SQL collation and the rules around sorting data for unicode and non-unicode data are different. The Latin1_General_CI_AS collation is a Windows collation and the rules around sorting unicode and non-unicode data are the same.


1 Answers

If your table or column is different from the MySQL default, in my case latin1_sweedish_ci, then it will print out the collation with the column. See the following experimentation that demonstrates this.

To set the default character set, see this post.

First, lets create a datbase with two tables. One table has the character set and collation specified.

mysql> create database SO;
mysql> use SO;
mysql> create table test1 (col1 text, col2 text);
mysql> create table test2 (col1 text, col2 text) character set utf8 collate utf8_unicode_ci;

Now check the show create table to see what it looks like:

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` text,
      `col2` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

We see that test2 already looks like the columns are specified specifically rather than using the default. I suspect if it's different from the MySQL default it will list it rather than if it's different from the table default. Let's now see how they look in the information_schema database.

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-------------------+
    | table_schema | table_name | table_collation   |
    +--------------+------------+-------------------+
    | SO           | test1      | latin1_swedish_ci |
    | SO           | test2      | utf8_unicode_ci   |
    +--------------+------------+-------------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-------------------+
    | table_schema | table_name | column_name | character_set_name | collation_name    |
    +--------------+------------+-------------+--------------------+-------------------+
    | SO           | test1      | col1        | latin1             | latin1_swedish_ci |
    | SO           | test1      | col2        | latin1             | latin1_swedish_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci   |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci   |
    +--------------+------------+-------------+--------------------+-------------------+
    4 rows in set (0.00 sec)

It looks like the columns have a specific character set and collation regardless of if we specified it. Lets update test1 to the prefered character set and collation and see what happens.

mysql> ALTER TABLE test1 CONVERT TO CHARACTER SET utf8  COLLATE utf8_unicode_ci;
    Query OK, 0 rows affected (0.05 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext COLLATE utf8_unicode_ci,
      `col2` mediumtext COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

Now they're both putting the collation in the show create table statement. Let's check the information_schema again.

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-----------------+
    | table_schema | table_name | table_collation |
    +--------------+------------+-----------------+
    | SO           | test1      | utf8_unicode_ci |
    | SO           | test2      | utf8_unicode_ci |
    +--------------+------------+-----------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-----------------+
    | table_schema | table_name | column_name | character_set_name | collation_name  |
    +--------------+------------+-------------+--------------------+-----------------+
    | SO           | test1      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci |
    +--------------+------------+-------------+--------------------+-----------------+
    4 rows in set (0.00 sec)

Looks to be all about the same. But what happens when we add an extra column to both tables?

mysql> alter table test1 add column col3 text;
    Query OK, 0 rows affected (0.05 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table test2 add column col3 text;
    Query OK, 0 rows affected (0.06 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext COLLATE utf8_unicode_ci,
      `col2` mediumtext COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

In both cases, they picked up the collation from the table. So there shouldn't be much worry about a column added later being out of whack. Let's check the information_schema one more time...

mysql> select table_schema, table_name, table_collation from information_schema.tables where table_schema = 'SO';
    +--------------+------------+-----------------+
    | table_schema | table_name | table_collation |
    +--------------+------------+-----------------+
    | SO           | test1      | utf8_unicode_ci |
    | SO           | test2      | utf8_unicode_ci |
    +--------------+------------+-----------------+
    2 rows in set (0.00 sec)

mysql> select table_schema, table_name, column_name, character_set_name, collation_name from information_schema.columns where table_schema = 'SO';
    +--------------+------------+-------------+--------------------+-----------------+
    | table_schema | table_name | column_name | character_set_name | collation_name  |
    +--------------+------------+-------------+--------------------+-----------------+
    | SO           | test1      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test1      | col3        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col1        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col2        | utf8               | utf8_unicode_ci |
    | SO           | test2      | col3        | utf8               | utf8_unicode_ci |
    +--------------+------------+-------------+--------------------+-----------------+
    6 rows in set (0.00 sec)

Yeah. All looks like it's working the same way. But what about that hypothesis about it only displaying if it is different from the MySQL default as opposed to the table default? Let's set test1 back to what it used to be.

mysql> ALTER TABLE test1 CONVERT TO CHARACTER SET latin1  COLLATE latin1_swedish_ci;
    Query OK, 0 rows affected (0.02 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext,
      `col2` mediumtext,
      `col3` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

Seems to look just like when we started. Now to deomonstrate that it is the MySQL default and not just the database default, let's set the default for the database.

mysql> Alter database SO default character set utf8 collate utf8_unicode_ci;
    Query OK, 1 row affected (0.00 sec)

mysql> show create table test1;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test1 | CREATE TABLE `test1` (
      `col1` mediumtext,
      `col2` mediumtext,
      `col3` text
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    +-------+-----------------+
    1 row in set (0.00 sec)

mysql> show create table test2;
    +-------+-----------------+
    | Table | Create Table
    +-------+-----------------+
    | test2 | CREATE TABLE `test2` (
      `col1` text COLLATE utf8_unicode_ci,
      `col2` text COLLATE utf8_unicode_ci,
      `col3` text COLLATE utf8_unicode_ci
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    +-------+-----------------+
    1 row in set (0.00 sec)

As you can see, test1 is still looking like when we first started and the show create table is not affected by the database default.

like image 65
Cohan Avatar answered Oct 26 '22 21:10

Cohan