Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character set mismatch on Linux with ODBC to SQL Server

I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).

Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:

locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"

I can both insert and select a non-ASCII value into a table.

However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.

SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a          |
+------------+
| Béthune |
+------------+

If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!

I've already got the locale set to en_GB.utf8, and a client charset of UTF-8 in the database connection details. Aargh!

FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.

EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare call. Something in ODBC is definitely doing an iconv translation, but evidently not to the correct character set!

#0  0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1  0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2  0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2

I'll try compiling my own UnixODBC to see better what's going on.

EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET) reports back ISO-8859-1. That is strange, since the man page for it says it's the same string you get from locale charmap, which returns UTF-8. I'm guessing that's the problem but still not sure how to solve.

like image 546
asc99c Avatar asked May 10 '16 14:05

asc99c


2 Answers

A colleague at work has just figured out the solution for FreeTDS at least.

For a direct driver connection (SQLDriverConnect()), adding ClientCharset=UTF-8;ServerCharset=CP1252; to the connection string fixed the problem

For a connection via the driver manager (SQLConnect()), I can add these lines to the connection settings in odbc.ini:

client charset = UTF-8
server charset = CP1252

Can't yet figure out a solution using the Microsoft driver ...

like image 108
asc99c Avatar answered Sep 20 '22 02:09

asc99c


A solution for Microsoft ODBC Driver might be to set a proper value into the LANG environment variable.

Make sure you have your required locale installed and configured. Also make sure that the LANG environment variable is set correctly for the user you are running your application under. This might be tricky for daemons. For example to make it work for PHP with Apache2 I had to add export LANG=en_US.utf8 into /etc/apache2/envvars.

like image 25
P-39 Airacobra Avatar answered Sep 19 '22 02:09

P-39 Airacobra