Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDO DBLIB multibyte (chinese) character encoding - SQL server

On a Linux machine, I am using PDO DBLIB to connect to an MSSQL database and insert data in a SQL_Latin1_General_CP1_CI_AS table. The problem is that when I am trying to insert chinese characters (multibyte) they are inserted as 哈市香åŠåŒºç æ±Ÿè·¯å·.

My (part of) code is as follows:

$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);

$query = "
    INSERT INTO UserSignUpInfo
    (FirstName)
    VALUES
    (:firstname)";

$STH = $DBH->prepare($query);

$STH->bindParam(':firstname', $firstname);

What I've tried so far:

  1. Doing mb_convert_encoding to UTF-16LE on $firstname and CAST as VARBINARY in the query like:

    $firstname = mb_convert_encoding($firstname, 'UTF-16LE', 'UTF-8');

    VALUES
    (CAST(:firstname AS VARBINARY));
    

    Which results in inserting the characters properly, until there are some not-multibyte characters, which break the PDO execute.

  2. Setting my connection as utf8:

    $DBH = new PDO("dblib:host=$myServer;dbname=$myDB;charset=UTF-8;", $myUser, $myPass);
    $DBH->exec('SET CHARACTER SET utf8');
    $DBH->query("SET NAMES utf8");
    
  3. Setting client charset to UTF-8 in my freetds.conf

    Which had no impact.

Is there any way at all, to insert multibyte data in that SQL database? Is there any other workaround? I've thought of trying PDO ODBC or even mssql, but thought it's better to ask here before wasting any more time.

Thanks in advance.

EDIT:

I ended up using MSSQL and the N data type prefix. I will swap for and try PDO_ODBC when I have more time. Thanks everyone for the answers!

like image 548
Manolis Avatar asked Feb 26 '15 10:02

Manolis


People also ask

How does SQL Server handle Chinese characters?

You are not able to store Chinese characters in SQL Server with a collation of Latin1. You will need to choose a collation that allows Chinese characters - unicode is probably your best bet if you also are storing English text.

What character encoding does SQL Server use?

For more information, see the Binary collations section in this article. Enables UTF-8 encoded data to be stored in SQL Server. If this option isn't selected, SQL Server uses the default non-Unicode encoding format for the applicable data types.


2 Answers

Is there any way at all, to insert multibyte data in [this particular] SQL database? Is there any other workaround?

  1. If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux (only for 64-bit Red Hat Enterprise Linux, and 64-bit SUSE Linux Enterprise) which support Unicode.

  2. If you can change to PDO_ODBC, then the N-prefix for inserting Unicode is going to work.

  3. If you can change the affected table from SQL_Latin1_General_CP1_CI_AS to UTF-8 (which is the default for MSSQL), then that would be ideal.

Your case is more restricted. This solution is suited for the case when you have mixed multibyte and non-multibyte characters in your input string, and you need to save them to a Latin table, and the N data type prefix isn't working, and you don't want to change away from PDO DBLIB (because Microsoft's Unicode PDO_ODBC is barely supported on linux). Here is one workaround.

Conditionally encode the input string as base64. After all, that's how we can safely transport pictures in line with emails.

Working Example:

$DBH = new PDO("dblib:host=$myServer;dbname=$myDB;", $myUser, $myPass);

$query = "
INSERT INTO [StackOverflow].[dbo].[UserSignUpInfo]
           ([FirstName])
     VALUES
           (:firstname)";

$STH = $DBH->prepare($query);

$firstname = "输入中国文字!Okay!";

/* First, check if this string has any Unicode at all */
if (strlen($firstname) != strlen(utf8_decode($firstname))) {
    /* If so, change the string to base64. */
    $firstname = base64_encode($firstname);
}

$STH->bindParam(':firstname', $firstname);
$STH->execute(); 

Then to go backwards, you can test for base64 strings, and decode only them without damaging your existing entries, like so:

while ($row = $STH->fetch()) {
    $entry = $row[0];

    if (base64_encode(base64_decode($entry , true)) === $entry) {

         /* Decoding and re-encoding a true base64 string results in the original entry */
         print_r(base64_decode($entry) . PHP_EOL);

    } else {

         /* Previous entries not encoded will fall through gracefully */
         print_r($entry  . PHP_EOL);
    }
}

Entries will be saved like this:

Guan Tianlang
5pys6Kqe44KS5a2maGVsbG8=

But you can easily convert them back to:

Guan Tianlang
输入中国文字!Okay!
like image 60
Drakes Avatar answered Sep 29 '22 02:09

Drakes


Collation shouldn't matter here.

Double-byte characters need to be stored in nvarchar, nchar, or ntext fields. You don't need to perform any casting.

The n data type prefix stands for National, and it causes SQL Server to store text as Unicode (UTF-16).

Edit:

PDO_DBLIB does not support Unicode, and is now deprecated.

If you can switch to PDO_ODBC, Microsoft provides free SQL Server ODBC drivers for Linux which support Unicode.

Microsoft - SQL Server ODBC Driver Documentation

Blog - Installing and Using the Microsoft SQL Server ODBC Driver for Linux

like image 29
Jon Tirjan Avatar answered Sep 29 '22 01:09

Jon Tirjan