Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you import UTF-8 flat files into SQL Server 2008 R2?

I have a bunch of UTF-8 encoded flat files that need to be imported into a SQL Server 2008 R2 database. Bulk inserts are not able to identify the diameters nor seems to accept UTF-8.

I understand that there is a number of articles on how SQL Server 2008 deals with UTF-8 encoding, but I'm sort of looking for any updated answers as most of those articles are old.

Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

I want to stay away from manually converting each one. Furthermore, SSIS packages that I've attempted to create can read and separate the data. It just can't move the data it seems. :(

The flat files are generated by Java. Converting the java environment from UTF-8 to any other encoding has been unsuccessful.

NOTE

I have no intention of storing UTF-8 data. My delimiter is coming in funky because it's UTF-8. SQL Server cannot read the characters when separating the columns and rows. That's it.

like image 242
Fastidious Avatar asked Sep 11 '13 19:09

Fastidious


People also ask

How do I import a flat file into SQL Server?

To access the Import Flat File Wizard, follow these steps: Open SQL Server Management Studio. Connect to an instance of the SQL Server Database Engine or localhost. Expand Databases, right-click a database (test in the example below), point to Tasks, and click Import Flat File above Import Data.

Does SQL Server support UTF-8?

SQL Server 2019 (15. x) introduces an additional option for UTF-8 encoding.


3 Answers

Not true, you simply need to choose code page 65001

enter image description here

like image 127
Arthur Avatar answered Sep 28 '22 06:09

Arthur


  1. convert your data file to UTF-16 Little Endian (exactly Little Endian)
  2. use bcp with -w option.
like image 35
loon3000 Avatar answered Sep 28 '22 06:09

loon3000


Just for reference, if someone google it, and falls here like me.


I've tried the accepted answer a dozen times, with no success. In my case, my data file was a .csv flat file, which had a lot of accents characters/letters, like ç é ã á.

I also noted that no matter what encoding I choose, the import was made using the 1251 (ANSI - Latin 1) encoding.

So, the solution was convert before import, my .csv file from UTF-8 to the very same 1251 (ANSI - Latin 1) encoding. I did the conversion using Notepad++.

After converting it, did the regular import (through SSMS Tasks -> "Import Data" wizard), selecting the 1251 (ANSI - Latin 1) encoding, and everything was imported correctly.


Environment:

SQL Server Web 2016

SQL Server Management Studio v17.9.1

Notepad++ v7.7.1


Also, this answers too the original OP's question:

Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

like image 43
Vitox Avatar answered Sep 28 '22 06:09

Vitox