Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Utf-8: successful conversion to iso-8859-1 but not to iso-8859-2

I have a database on MS Access, that I use with PHP through a call with PDO and the odbc driver. I have French, Danish and Polish words in my database. No problem for French and Danish, but no way to have the Polish characters, I only get "?" instead.

Here is the code:

    try{
 $db = new PDO("odbc:DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=$dbName; Uid=Admin;Pwd=;");
  }
  catch(PDOException $e){
    echo $e->getMessage();
  }
  $answer = $db -> query("SELECT * FROM dict_main WHERE ID < 20");
      while($data = $answer-> fetch() ){
          echo iconv("iso-8859-1","utf-8",htmlspecialchars($data['DK'])) . ' ';
          echo iconv("iso-8859-2","utf-8",htmlspecialchars($data['PL'])) . ' ';
          echo iconv("iso-8859-1","utf-8",htmlspecialchars($data['FR'])) . ' ';
        }

Please let me know if somebody has an idea, as I am running out of them and nothing seems to work, or if I should give more information about my problem that I didn't think of.

like image 220
George Avatar asked Jul 05 '13 18:07

George


People also ask

How do I convert UTF-8 to ISO 8859-1?

byte[] utf8 = ... byte[] latin1 = new String(utf8, "UTF-8"). getBytes("ISO-8859-1"); You can exercise more control by using the lower-level Charset APIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.

What is the difference between ISO 8859-1 and UTF-8?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

How can I tell if a file is ISO 8859-1?

If you find a byte with its high-order bit set, where the bytes both immediately before and immediately after it don't have their high-order bit set, you know it's ISO encoded (because bytes >127 always occur in sequences in UTF-8).

What encoding is latin1?

ISO 8859-1 is the ISO standard Latin-1 character set and encoding format. CP1252 is what Microsoft defined as the superset of ISO 8859-1. Thus, there are approximately 27 extra characters that are not included in the standard ISO 8859-1.


2 Answers

It looks like htmlspecialchars() does not support ISO-8859-2. So it probably breaks the contents of $data['PL'] before it gets to iconv().

Try first converting the input string into UTF-8, then apply htmlspecialchars() to the UTF-8 string:

echo htmlspecialchars( iconv("iso-8859-2", "utf-8", $data['PL']) );
like image 115
RandomSeed Avatar answered Sep 23 '22 15:09

RandomSeed


You are using PHP 5.3.13. Then i would expect the charset in new POD to do its job. (Prior to 5.3.6. you would have to use $db->exec("set names utf8");). So add the charset=utf8; to your connect line. I also expect your Access database to be UTF-8.

You can also try charset=ucs2; with and without htmlspecialchars( iconv("iso-8859-2", "utf-8", $data['PL']) );

$db = new PDO("odbc:DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=$dbName; Uid=Admin;Pwd=;charset=utf8;");

or

$db = new PDO("odbc:DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=$dbName; Uid=Admin;Pwd=;charset=ucs2;");

B.T.W.: Don't forget to set your output to UTF-8 at the top of your document.

<?php header('Content-Type:text/html; charset=UTF-8'); ?>

and/or

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>

If that still doesn't work i suspect that the encoding in your Access database is messed up.


Edit:

Only thing i can think of at this point is using odbc_connect directly and bypassing PDO but i think the problem is in ODBC (Access->ODBC). If that's the case this won't help:

$conn=odbc_connect("DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=$dbName; Uid=Admin;Pwd=;charset=utf8", "", "");
$rs=odbc_exec($conn, "SELECT * FROM dict_main WHERE ID < 20");
odbc_result_all($rs,"border=1");
like image 29
Rik Avatar answered Sep 25 '22 15:09

Rik