Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write UTF-8 characters using bulk insert in SQL Server?

Tags:

I am doing a BULK INSERT into sqlserver and it is not inserting UTF-8 characters into database properly. The data file contains these characters, but the database rows contain garbage characters after bulk insert execution.

My first suspect was the last line of the format file:

10.0 3 1 SQLCHAR  0  0  "{|}"  1 INSTANCEID "" 2 SQLCHAR  0  0  "{|}"  2 PROPERTYID "" 3 SQLCHAR  0  0  "[|]"  3 CONTENTTEXT "SQL_Latin1_General_CP1_CI_AS" 

But, after reading this official page it seems to me that this is actually a bug in reading the data file by the insert operation in SQL Server version 2008. We are using version 2008 R2.

What is the solution to this problem or at least a workaround?

like image 467
Goran Jovic Avatar asked Mar 31 '11 09:03

Goran Jovic


2 Answers

I came here before looking for a solution for bulk inserting special characters. Didn't like the workaround with UTF-16 (that would double the size of csv file). I found out that you definitely CAN and it's very easy, you don't need a format file. This answer is for other people who are looking for the same, since it doesn't seem to be documented well anywhere, and I believe this is a very common issue for non-english speaking people. The solution is: just add CODEPAGE='65001' inside the with statement of the bulk insert. (65001=codepage number for UTF-8). Might not work for all unicode characters as suggested by Michael O, but at least it works perfect for latin-extended, greek and cyrillic, probably many others too.

Note: MSDN documentation says utf-8 is not supported, don't believe it, for me this works perfect in SQL server 2008, didn't try other versions however.

e.g.:

BULK INSERT #myTempTable  FROM  'D:\somefolder\myCSV.txt'+ WITH      (          CODEPAGE = '65001',         FIELDTERMINATOR = '|',         ROWTERMINATOR ='\n'     ); 

If all your special characters are in 160-255 (iso-8859-1 or windows-1252), you could also use:

BULK INSERT #myTempTable  FROM  'D:\somefolder\myCSV.txt'+ WITH      (          CODEPAGE = 'ACP',         FIELDTERMINATOR = '|',         ROWTERMINATOR ='\n'     ); 
like image 142
Tom-K Avatar answered Sep 16 '22 15:09

Tom-K


You can't. You should first use a N type data field, convert your file to UTF-16 and then import it. The database does not support UTF-8.

like image 33
Michael-O Avatar answered Sep 18 '22 15:09

Michael-O