Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating MySQL UTF8 to UTF8MB4 problems and questions

Im trying to convert my UTF8 MySQL 5.5.30 database to UTF8MB4. I have looked at this article https://mathiasbynens.be/notes/mysql-utf8mb4 but have some questions.

I have done these

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

The last one was manually done with 62 tables, one of them gave me this warning

13:08:30 ALTER TABLE bradspelold.games CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci 101289 row(s) affected, 2 warning(s): 1071 Specified key was too long; max key length is 767 bytes 1071 Specified key was too long; max key length is 767 bytes Records: 101289 Duplicates: 0 Warnings: 2 3.016 sec

  1. Is this a problem? What could I do to fix it?

The next step is

ALTER TABLE table_name CHANGE column_name column_name
         VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Im not sure about the command, why is there 2 column_name?
  2. Should I do this only on the VARCHAR(191) columns? I dont think I have any of them?
  3. Do you know of any more artickels like this that explains more id detail why and how?

Edit :

Table Show games

CREATE  TABLE `games` (
        `id` int(10) unsigned NOT NULL DEFAULT \'0\',
        `name` varchar(255) NOT NULL,
        `description` mediumtext,
        `yearPublished` datetime NOT NULL,
        `minPlayers` int(10) unsigned NOT NULL,
        `maxPlayers` int(10) unsigned NOT NULL,
        `playingTime` varchar(127) NOT NULL,
        `grade` double NOT NULL DEFAULT \'0\',
        `updated` datetime NOT NULL,
        `forumParentId` int(10) unsigned DEFAULT \'0\',
        `lastVisited` datetime DEFAULT NULL,
        `inactivatedDate` datetime DEFAULT NULL,
        `bggGrade` double DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `inactivatedDate` (`inactivatedDate`),
        KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8'

Edit 2:

    'CREATE TABLE `forum_threads` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `createrId` int(10) unsigned DEFAULT NULL,
      `replys` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostUserId` int(10) unsigned DEFAULT NULL,
      `lastPostId` int(10) unsigned DEFAULT NULL,
      `forumId` int(10) unsigned DEFAULT NULL,
      `visits` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createrNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `solved` tinyint(1) NOT NULL DEFAULT ''0'',
      `locked` tinyint(1) NOT NULL DEFAULT ''0'',
      `lockedByUserId` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lockedDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredUserId` int(10) unsigned DEFAULT NULL,
      `glued` tinyint(1) NOT NULL DEFAULT ''0'',
      `pollId` int(10) unsigned DEFAULT NULL,
      `facebookPostId` bigint(20) DEFAULT NULL,
      `facebookImportedDate` datetime DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `FK_forum_threads_1` (`forumId`),
      KEY `FK_forum_threads_2` (`pollId`),
      KEY `createdDate` (`createdDate`),
      KEY `createrId` (`createrId`),
      KEY `lastPostCreated` (`lastPostCreated`),
      CONSTRAINT `FK_forum_threads_1` FOREIGN KEY (`forumId`) REFERENCES `forum` (`id`) ON DELETE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=4306 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

'CREATE TABLE `forum` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `threads` int(10) unsigned NOT NULL DEFAULT ''0'',
  `createrId` int(10) unsigned DEFAULT NULL,
  `lastPostUserId` int(10) unsigned DEFAULT NULL,
  `lastThreadId` int(10) unsigned DEFAULT NULL,
  `parentForumId` int(10) unsigned DEFAULT NULL,
  `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `lastThreadTitle` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `alteredUserId` int(10) unsigned DEFAULT NULL,
  `placeOrder` int(10) unsigned NOT NULL DEFAULT ''0'',
  `separator` tinyint(1) NOT NULL DEFAULT ''0'',
  `rightLevel` int(10) unsigned NOT NULL DEFAULT ''1'',
  `createChildForum` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  `createThreads` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  PRIMARY KEY (`id`),
  KEY `Index_1` (`id`,`parentForumId`)
) ENGINE=InnoDB AUTO_INCREMENT=375 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'
like image 903
Banshee Avatar asked Mar 22 '15 12:03

Banshee


3 Answers

  1. There are limits on the size of an INDEX. You bumped into the limit because utf8mb4 needs up to 4 bytes per character, where as utf8 needs only 3. Meanwhile the INDEX size limit is in bytes.

The 'solution' is to decide what to do about the over-sized index. (more below)

2.

ALTER TABLE t CHANGE col col ...

is the same as the more logical

ALTER TABLE t MODIFY col ...

The former allows you to change the name of the column, hence two copies of the column name when you don't need to change the name.

  1. Quite likely you had VARCHAR(255) which takes 767 bytes in utf8 (3*255+2; the "2" is the size of the length field). The equivalent in the 4-byte utf8mb4 would be (191) (4*191+2=766; not room for more than 191).

  2. I have not seen an article about it. I suspect that what I just said is most of what needs to be said.

So...

Plan A: Do you have foo VARCHAR(255) and it was utf8? Is the data in it always (now and in the future) shorter than 191 characters? If so, then simply do the ALTER.

Plan B: If you need more than 191, do you really need the INDEX? DROP INDEX may be an alternative.

Plan C: Or, you could use a "prefix" index: INDEX(foo(191)), while leaving it VARCHAR(255). Usually "prefix" indexes are useless, but you might have a use case for which it works.

To discuss this further, please provide SHOW CREATE TABLE for the table in question, and discuss the meaning of that particular field and its INDEX.

like image 117
Rick James Avatar answered Sep 27 '22 20:09

Rick James


DB="database_name"
USER="mysql_user"
PASS="mysql_password"
(
    echo 'ALTER DATABASE `'"$DB"'` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
    mysql -p$PASS -u $USER "$DB" -e "SHOW TABLES" --batch --skip-column-names \
    | xargs -I{} echo 'ALTER TABLE `'{}'` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
) \
| mysql -p$PASS -u $USER "$DB"
  • To get the script you work open your command line and use the following steps:

    1. nano convert_to_utf8mb4.sh
    2. paste the script & save
    3. sudo chmod 755 convert_to_utf8mb4.sh (in terminal)
    4. run the script by type ./convert_to_utf8mb4.sh

    Yes, collation has been changed!

like image 42
Mahbub Tito Avatar answered Sep 27 '22 19:09

Mahbub Tito


this is an old question but following some of the answers here 5 years later is, as I discovered, a bad idea. Do not alter the size of your VARCHAR fields, you could damage your data and break everything.

In current versions of MySQL and MaraiDB add this to your config and it will support the larger keys necessary for UTF8mb4

innodb_large_prefix=1

I also suggest adding innodb_file_per_table =1 innodb_file_format=Barracuda

then the conversion will happen without errors/warnings about the key length

like image 33
EricG Avatar answered Sep 27 '22 18:09

EricG