Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Use UTF-8 Collation in SQL Server database?

I've migrated a database from mysql to SQL Server (politics), original mysql database using UTF8.

Now I read https://dba.stackexchange.com/questions/7346/sql-server-2005-2008-utf-8-collation-charset that SQL Server 2008 doesn't support utf8, is this a joke?

The SQL Server hosts multiple databases, mostly Latin-encoded. Since the migrated db is intended for web publishing, I want to keep the utf8-encoding. Have I missed something or do I need to enc/dec at application level?

like image 657
Teson Avatar asked Sep 20 '12 12:09

Teson


People also ask

Does SQL Server support UTF-8?

SQL Server 2019 introduces support for the widely used UTF-8 character encoding. This has been a longtime requested feature and can be set as a database-level or column-level default encoding for Unicode string data.

How do I set my database to UTF-8?

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace dbname with the database name: Copy ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci; To exit the mysql program, type \q at the mysql> prompt.

What is UTF-8 collation?

A collation is a property of string types in SQL Server, Azure SQL, and Synapse SQL that defines how to compare and sort strings. In addition, it describes the encoding of string data. If a collation name in Synapse SQL ends with UTF8, it represents the strings encoded with the UTF-8 encoding schema.

Which UTF-8 collation should I use?

If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci). You should not use UTF-8 because MySQL's UTF-8 is different from proper UTF-8 encoding. This is the case because it doesn't offer full unicode support which can lead to data loss or security issues.


1 Answers

UTF-8 is not a character set, it's an encoding. The character set for UTF-8 is Unicode. If you want to store Unicode text you use the nvarchar data type.

If the database would use UTF-8 to store text, you would still not get the text out as encoded UTF-8 data, you would get it out as decoded text.

You can easily store UTF-8 encoded text in the database, but then you don't store it as text, you store it as binary data (varbinary).

like image 139
Guffa Avatar answered Oct 04 '22 01:10

Guffa