Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to have SQL Server convert collation to UTF-8 / UTF-16

In a project I am working on my data is stored in SQL Server, with the collation Danish_Norwegian_CI_AS. The data is output'ed through FreeTDS and ODBC, to python that handles the data as UTF-8. Some of the characters, like å, ø and æ, are not being coded correctly, causing the project progress to grind to a halt.

I spent a couple of hours reading about the confusing world of encodings, collation and code-pages, and feel like I have gotten a better understanding of the entire picture.

Some of the articles I have read, makes me think that it would be possible to: Specify in the SQL select statement, that the collation data should be encoded to UTF-8 when it is output'ed.

The reason I am thinking this is possible is this article which shows an example of how to get to tables, with different collations, to play nice together.

Any pointers in the direction of converting collation to UTF-8 / UTF-16, would be greatly appreciated!

EDIT: I have read that SQL Server provides a unicode option through nchar, nvarchar and ntext, and that the other string variables char, varchar and text are coded according to set collation. I have also read that the above mentioned unicode options are coded in utf-16 variant ucs-2 (I hope I am remembering that right). So; in order to allow tables of locale collation and unicode, to play nice, there should be a conversion function, no?

like image 714
Rookie Avatar asked May 16 '15 21:05

Rookie


People also ask

Does SQL Server support UTF-8?

For more information, see the Binary collations section in this article. Enables UTF-8 encoded data to be stored in SQL Server. If this option isn't selected, SQL Server uses the default non-Unicode encoding format for the applicable data types.

Does SQL Server support UTF-16?

Microsoft SQL Server and Microsoft SQL Server Express do not support UTF-8 at the database level. They support nchar, nvarchar, and ntext to store fixed format Unicode data (UTF-16).

Can you change the collation of a database in SQL Server?

You can change the collation of any new objects that are created in a user database by using the COLLATE clause of the ALTER DATABASE statement. This statement does not change the collation of the columns in any existing user-defined tables. These can be changed by using the COLLATE clause of ALTER TABLE.

What is UTF-8 collation?

A collation is a property of string types in SQL Server, Azure SQL, and Synapse SQL that defines how to compare and sort strings. In addition, it describes the encoding of string data. If a collation name in Synapse SQL ends with UTF8, it represents the strings encoded with the UTF-8 encoding schema.


1 Answers

It seems that SQL does not support UTF-8 (see here) but you can try changing the collation in the select like:

SELECT Account COLLATE SQL_Latin1_General_CP1_CI_AS
from Data

You can also strip the accents using this solution: How to remove accents and all chars <> a..z in sql-server?

Another solution could be casting your column to nvarchar

SELECT cast (Account as nvarchar) as NewAccount 
from Data

where Account is varchar on your initial table.

If for example you try:

SELECT cast(cast(N'ţ' as varchar) as nvarchar)

the end result will be "ţ"

like image 157
sbiz Avatar answered Oct 28 '22 12:10

sbiz