Insert rows with Unicode characters using BCP

Tags:

I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net

But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:

bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net

But then I get '0 rows copied'.

What am I doing wrong and how do I bulk insert Unicode characters using BCP?

476

asked Jan 06 '17 13:01

2 Answers

But data.csv is UTF8 encoded

The UTF-8 encoding is the primary issue. Using -w won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.

The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):

If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the -w switch.

I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the -c switch.
If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add -c -C 65001 to the command line. -C is for "code page", and 65001 is the code page for UTF-8.

The MSDN page for bcp Utility states (in the explanation of the -C switch):

Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.

UPDATE

Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:

UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2

answered Oct 07 '22 10:10

The answer from Solomon helped me in my struggle with Unicode and SQL Server 2014. I would like to share my experience about Unicode here. I hope this helps the next person who suffers from Unicode problems with BCP.

I have had a hard time figuring out the UTF and Unicode of SQL Server 2014. I am using Powershell to upload using BCP to a SQL Server 2014 SP2 database. My files are in Dutch, UTF-8 without BOM. I used Powershell to convert the files into microsoft's Unicode:

Get-ChildItem "C:\Documents\ProjectA" -filter *.CSV |
ForEach-Object {
    $path = $_.basename + '.unicode.CSV' 
    get-content $_ | Set-Content -Encoding Unicode -path $path 
}

Then I used BCP without format file:

Get-ChildItem "C:\Documents\ProjectA" -filter *.unicode.CSV |
 ForEach-Object { 
   try { $output = bcp ProjectA.dbo.auditlog in $_.FullName -w "-t," -T -F2 
            if ($LASTEXITCODE)
            {  throw $output
            }
    catch
    { $Output >> C:\Documents\ProjectA\BCPCommandFailed$(get-date -f yyyy-MM-dd).log
    }
}

The conversion into Unicode causes file sizes to double e.g. from 11,630KB into 23,259KB. Template file whether XML or non-XML did not work.

answered Oct 07 '22 09:10

Pho

Related questions
                            
                                How to Store and Retrieve a varbinary(max) Column in SQL Server
                            
                                INSERT a row only with default or null values
                            
                                SQL query fails when using pyodbc, but works in SQL
                            
                                Retry Entity Framwork DbContext.SaveChanges after double inserting a key
                            
                                SQL Server Stored Procedure IF Exist Update Else Insert
                            
                                Do Clustered Index and the table on which it is created both contain the actual data?
                            
                                Trigger on insert and update that adds modification date
                            
                                Please confirm: SYSDATETIME() is slower than GETDATE() in WHERE clause
                            
                                How to import data from .csv in SQL Server using PowerShell?
                            
                                QUALIFY-Like Function in SQL Server
                            
                                RANDBETWEEN for SQL Server 2012
                            
                                SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR
                            
                                Distinct with group by in Linq
                            
                                How do I connect to an SQL server database in R
                            
                                SQL Server connection context using temporary table cannot be used in stored procedures called with SqlDataAdapter.Fill
                            
                                Converting Base64 String to PictureBox
                            
                                Create a linked server in SQL Azure database [duplicate]
                            
                                Could there be a deadlock when using optimistic locking?
                            
                                PHP Warning: PHP Startup: Unable to load dynamic library '- The specified procedure could not be found. in Unknown on line 0
                            
                                Delete all records in table which have no reference in another table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Insert rows with Unicode characters using BCP

Tags:

sql-server

unicode

azure-sql-database

bulkinsert

bcp

mtmacdonald

People also ask

2 Answers

Solomon Rutzky

Pho

Recent Activity

Donate For Us