Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

freebcp: "Unicode data is odd byte size for column. Should be even byte size"

This file works fine (UTF-8):

$ cat ok.txt
291054  Ţawī Rifā

This file causes an error (UTF-8):

$ cat bad.txt
291054  Ţawī Rifā‘

Here's the message:

$ freebcp 'DB.dbo.table' in bad.txt ... -c
Starting copy...
Msg 20050, Level 4
Attempt to convert data stopped by syntax error in source field

Msg 4895, Level 16, State 2
Server '...', Line 1
    Unicode data is odd byte size for column 2. Should be even byte size.
Msg 20018, Level 16
General SQL Server error: Check messages from the SQL Server

The only difference is the last character, which is unicode 2018 (left single quotation mark)

Any idea what is causing this error?

The SQL Server uses UTF-16LE (though TDS starts with UCS-2LE and switches over I believe)

The column in question is nvarchar(200)

Here's the packet sent right before the error:

packet.c:741:Sending packet
0000 07 01 00 56 00 00 01 00-81 02 00 00 00 00 00 08 |...V.... ........|
0010 00 38 09 67 00 65 00 6f-00 6e 00 61 00 6d 00 65 |.8.g.e.o .n.a.m.e|
0020 00 69 00 64 00 00 00 00-00 09 00 e7 90 01 09 04 |.i.d.... ...ç....|
0030 d0 00 34 04 6e 00 61 00-6d 00 65 00 d1 ee 70 04 |Ð.4.n.a. m.e.Ñîp.|
0040 00 13 00 62 01 61 00 77-00 2b 01 20 00 52 00 69 |...b.a.w .+. .R.i|
0050 00 66 00 01 01 18      -                        |.f....|
like image 306
Neil McGuigan Avatar asked Aug 16 '16 20:08

Neil McGuigan


1 Answers

Update: This issue has apparently been fixed in FreeTDS v1.00.16, released 2016-11-04.


I can reproduce your issue using FreeTDS v1.00.15. It definitely looks like a bug in freebcp that causes it to fail when the last character of a text field has a Unicode code point of the form U+20xx. (Thanks to @srutzky for correcting my conclusion as to the cause.) As you noted, this works ...

291054  Ţawī Rifā

... and this fails ...

291054  Ţawī Rifā‘

... but I found that this also works:

291054  Ţawī Rifā‘x

So, an ugly workaround would be to run a script against your input file that would append a low-order non-space Unicode character to each text field (e.g., x which is U+0078, as in the last example above), use freebcp to upload the data, and then run an UPDATE statement against the imported rows to strip off the extra character.

Personally, I would be inclined to switch from FreeTDS to Microsoft's SQL Server ODBC Driver for Linux, which includes the bcp and sqlcmd utilities when installed using the instructions described here:

https://gallery.technet.microsoft.com/scriptcenter/SQLCMD-and-BCP-for-Ubuntu-c88a28cc

I just tested it under Xubuntu 16.04, and although I had to tweak the procedure a bit to use libssl.so.1.0.0 instead of libssl.so.0.9.8 (and the same for libcrypto), once I got it installed the bcp utility from Microsoft succeeded where freebcp failed.

If the SQL Server ODBC Driver for Linux will not work on a Mac then another alternative would be to use the Microsoft JDBC Driver 6.0 for SQL Server and a little bit of Java code, like this:

connectionUrl = "jdbc:sqlserver://servername:49242"
        + ";databaseName=myDb"
        + ";integratedSecurity=false";
String myUserid = "sa", myPassword = "whatever";

String dataFileSpec = "C:/Users/Gord/Desktop/bad.txt";
try (
        Connection conn = DriverManager.getConnection(connectionUrl, myUserid, myPassword);
        SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord(dataFileSpec, "UTF-8", "\t", false);
        SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(conn)) {
    fileRecord.addColumnMetadata(1, "col1", java.sql.Types.NVARCHAR, 50, 0);
    fileRecord.addColumnMetadata(2, "col2", java.sql.Types.NVARCHAR, 50, 0);
    bulkCopy.setDestinationTableName("dbo.freebcptest");
    bulkCopy.writeToServer(fileRecord);
} catch (Exception e) {
    e.printStackTrace(System.err);
}
like image 147
Gord Thompson Avatar answered Oct 04 '22 04:10

Gord Thompson