I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net
But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net
But then I get '0 rows copied'.
What am I doing wrong and how do I bulk insert Unicode characters using BCP?
SQL Server UNICODE() Function The UNICODE() function returns an integer value (the Unicode value), for the first character of the input expression.
To place each data record on its own line in the data file, specify the combination \r\n as the row terminator. When you use bcp interactively and specify \n (newline) as the row terminator, bcp automatically prefixes it with a \r (carriage return) character, which results in a row terminator of \r\n.
Basically you need to read the file using Codepage 65001 (UTF-8), then use Derived Column transformation where you would add a derived column for every text field in the csv file using cast expressions, for example (DT_STR,50,1252)LastName.
But data.csv is UTF8 encoded
The UTF-8 encoding is the primary issue. Using -w
won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.
The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):
If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the -w
switch.
I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the -c
switch.
If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add -c -C 65001
to the command line. -C
is for "code page", and 65001 is the code page for UTF-8.
The MSDN page for bcp Utility states (in the explanation of the -C
switch):
Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
UPDATE
Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:
UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2
The answer from Solomon helped me in my struggle with Unicode and SQL Server 2014. I would like to share my experience about Unicode here. I hope this helps the next person who suffers from Unicode problems with BCP.
I have had a hard time figuring out the UTF and Unicode of SQL Server 2014. I am using Powershell to upload using BCP to a SQL Server 2014 SP2 database. My files are in Dutch, UTF-8 without BOM. I used Powershell to convert the files into microsoft's Unicode:
Get-ChildItem "C:\Documents\ProjectA" -filter *.CSV |
ForEach-Object {
$path = $_.basename + '.unicode.CSV'
get-content $_ | Set-Content -Encoding Unicode -path $path
}
Then I used BCP without format file:
Get-ChildItem "C:\Documents\ProjectA" -filter *.unicode.CSV |
ForEach-Object {
try { $output = bcp ProjectA.dbo.auditlog in $_.FullName -w "-t," -T -F2
if ($LASTEXITCODE)
{ throw $output
}
catch
{ $Output >> C:\Documents\ProjectA\BCPCommandFailed$(get-date -f yyyy-MM-dd).log
}
}
The conversion into Unicode causes file sizes to double e.g. from 11,630KB into 23,259KB. Template file whether XML or non-XML did not work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With