Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the consequences if character sets for Triggers and Views do not match rest of database?

Tags:

mysql

We've recently gone through the effort of converting all our application's MySQL tables, columns, and .ini settings to utf8 encoding. However, we've found that views and triggers created prior to this change still have a reference to the latin1 character set--i.e. the following queries return records:

SELECT * FROM information_schema.triggers
WHERE trigger_schema=SCHEMA() 
AND    (collation_connection != 'utf8_general_ci' OR character_set_client != 'utf8')
;
SELECT * FROM information_schema.views
WHERE table_schema=SCHEMA()
AND   (collation_connection != 'utf8_general_ci' OR character_set_client != 'utf8')
;

Do I need to be concerned about this?

The MySQL documentation on information_schema.triggers and information_schema.views only says "The session value of the character_set_client system variable when the trigger was created." If that's all the value is storing, then is there any reason to try to fix them? (It doesn't sound like that would matter.) But on the other hand, I have to think the database developers chose to store it in the information_schema tables for some reason.

Production has been on utf8 for a while with the views and triggers still referencing latin1, and we haven't seen any issues (although we don't have a very large non-English user base). I've done some testing with different test strings and haven't seen any character corruption.

like image 383
Kip Avatar asked Jul 10 '15 14:07

Kip


2 Answers

See the bolded text in the quote below. If you used non-ASCII characters in your triggers/views, e.g. for the comparison with one of your UTF-8 columns, you're better off recreating them. If not it shouldn't matter since those variables are used to set the context for your object to be used/recreated later.

Quote from the Changes in MySQL 5.1.21 (2007-08-16)

Bugs Fixed

Incompatible Change: Several issues were identified for stored programs (stored procedures and functions, triggers, and events) and views containing non-ASCII symbols. These issues involved conversion errors due to incomplete character set information when translating these objects to and from stored format, such as:

  • Parsing the original object definition so that it can be stored.

  • Compiling the stored definition into executable form when the object is invoked.

  • Retrieval of object definitions from INFORMATION_SCHEMA tables.

  • Displaying the object definition in SHOW statements. This issue also affected mysqldump, which uses SHOW.

The fix for the problems is to store character set information from the object creation context so that this information is available when the object needs to be used later. The context includes the client character set, the connection character set and collation, and the collation of the database with which the object is associated.

As a result of the patch, several tables have new columns:

In the mysql database, the proc and event tables now have these columns: character_set_client, collation_connection, db_collation, body_utf8.

In INFORMATION_SCHEMA, the VIEWS table now has these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION. The ROUTINES, TRIGGERS, and EVENTS tables now have these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION.

These columns store the session values of the character_set_client and collation_connection system variables, and the collation of the database with which the object is associated. The values are those in effect at object creation time. (The saved database collation is not the value of the collation_database system variable, which applies to the default database; the database that contains the object is not necessarily the default database.)

like image 125
aergistal Avatar answered Sep 21 '22 05:09

aergistal


The different character sets shouldn't pose a problem unless you need to convert between formats. Then you need to make sure that there aren't extra or missing characters. One solution I found to this is to convert all data in your table to BLOB s and then change the character set and convert back to TEXT. https://www.percona.com/blog/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/ has an explanation of one way to convert from latin1 to utf8, if you need more details.

like image 25
user2425429 Avatar answered Sep 24 '22 05:09

user2425429