Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set encoding for pervasive database via ODBC in PHP?

I developed a PHP script which should connect to a pervasive database system:

$connection_string = "Driver={Pervasive ODBC Client Interface};ServerName=127.0.0.1;dbq=@test"; 
$conn = odbc_connect($connection_string,"administrator","password");

If I execute a query, the returning data is not UTF8. mb_detect_encoding tells me, the encoding is ASCII. I tried to convert the data via iconv, but it doesn't work. So i tried something like that to change the encoding after the script connected:

odbc_exec($conn, "SET NAMES 'UTF8'");
odbc_exec($conn, "SET client_encoding='UTF-8'");

But nothing helps! Can anyone help me? Thanks.

------------------------------ edit -------------------------------

here is the complete script, because nothing works so far:

class api {

    function doRequest($Url){
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $Url);
        curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
        curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
        $output = curl_exec($ch);
        curl_close($ch);
    }

}

$connection_string = "Driver={Pervasive ODBC Client Interface};ServerName=127.0.0.1;dbq=@test;Client_CSet=UTF-8;Server_CSet=UTF-8"; 
$conn = odbc_connect($connection_string,"administrator","xxx");

if ($conn) {

    $sql = "SELECT field FROM table where primaryid = 102"; 
    $cols = odbc_exec($conn, $sql);

    while( $row = odbc_fetch_array($cols) ) { 

        $api = new api(); 
        // --- 1 ---
        $api->doRequest("http://example.de/api.html?value=" . @urlencode($row["field"])); 
        // --- 2 ---
        $api->doRequest("http://example.de/api.html?value=" . $row["field"]); 
        // --- 3 ---
        $api->doRequest("http://example.de/api.html?value=" . utf8_decode($row["field"])); 

    }

}

The server log says the follwing:

--- 1 --- [24/May/2016:14:05:07 +0200] "GET /api.html?value=Talstra%E1e+7++++++++++++++++++++++++++++++++++++++++++++++++ HTTP/1.1" 200 93 "http://www.example.org/yay.htm" "MozillaXYZ/1.0"
--- 2 --- [24/May/2016:11:31:10 +0200] "GET /api.html?value=Talstra\xe1e 7                                                 HTTP/1.1" 200 83 "http://www.example.org/yay.htm" "MozillaXYZ/1.0"
--- 3 --- [24/May/2016:14:05:07 +0200] "GET /api.html?value=Talstra?e 7                                                 HTTP/1.1" 200 93 "http://www.example.org/yay.htm" "MozillaXYZ/1.0"

%E1 stand for á, but it should be ß (german character)

\xe1 stand for á, but it should be ß (german character)

like image 352
Tobias Bambullis Avatar asked May 20 '16 12:05

Tobias Bambullis


2 Answers

Your database is in ASCII Extended, not "Just ASCII"

The clue lies here:

%E1 stand for á, but it should be ß (german character)

%E1, or 225 for simplicity, stands for á in UTF8, . In extended ASCII its ß. Hold alt and type 225, you get a ß.

If the following from your question is in fact correct:

If I execute a query, the returning data is not UTF8.

Because the data isn't in UTF8.

What you have in your database is extended ASCII characters. Regular ASCII is a subset of UTF8, which is up to character at 128, extended isn't.

If you tried this, it won't work;

iconv("ASCII", "UTF-8", $string);

You can try this first, because its the least invasive, looks like mysql supports cp850, so you can try this at the top of your script:

odbc_exec($conn, "SET NAMES 'CP850'");
odbc_exec($conn, "SET client_encoding='CP850'");

This might work, if your original assertion is correct:

iconv("CP437", "UTF-8", $string);

or this, my initial hunch, that your database is in latin-1:

iconv("CP850", "UTF-8", $string);

IBM CP850 has all the printable characters that ISO-8859-1(latin-1) has, its just that ß is at 223 in ISO-8859-1.

You can see the position of ß in the table on this page: https://en.wikipedia.org/wiki/Western_Latin_character_sets_%28computing%29

As a drop in replacement to your existing code, in your question, see if this works:

    $api->doRequest("http://example.de/api.html?value=" . $iconv("CP850", "UTF-8",$row["field"])); 
    // --- 2 ---
    $api->doRequest("http://example.de/api.html?value=" . $iconv("CP850", "UTF-8",$row["field"])); 
    // --- 3 ---
    $api->doRequest("http://example.de/api.html?value=" . $iconv("CP850", "UTF-8",$row["field"])); 

This will work if your entire database is in the same encoding.

If your database isn't consistently adhering to one encoding, it might be possible that no one answer is completely right. If that is the case, you can also try the answer here, but with a different encoding:

Latin-1 / UTF-8 encoding php

// If it's not already UTF-8, convert to it
if (mb_detect_encoding($row["field"], 'utf-8', true) === false) {
    $row["field"] = mb_convert_encoding($row["field"], 'utf-8', 'iso-8859-1');
}

My real correct answer is, if you can, insert the data in UTF8 correctly, so you dont have problems like this. Of course, that is not always possible.

Reference:

Force encode from US-ASCII to UTF-8 (iconv)

like image 88
Paul Stanley Avatar answered Oct 03 '22 00:10

Paul Stanley


Try adding Client_CSet=UTF-8 to your connection string.

like image 35
DonBoitnott Avatar answered Oct 03 '22 00:10

DonBoitnott