I have some trouble reading in a text file (saved in Unicode UTF16-LE) in my PHP script.
My PHP script is saved (for some reasons) in UTF-8.
Here is my code:
$lines = file("./somedir/$filename");
for ($i=0; $i < count($lines); $i++) {
$lines[$i] = iconv("Unicode", "UTF-8", $lines[$i]); // converting to UTF8
}
echo "[0]:".$lines[0]; // outputs CORRECT text (like "This is the first line")
echo "[1]:".$lines[1]; // outputs something like çæ¤ææ¬çææ¸ææ°ã
Any idea please?
I checked value of count($lines)
and it's perfectly correct...
Thanks.
EDIT:
OK so I tried iconv("UTF-16", "UTF-8", $lines[$i]);
I also tried iconv("UTF-16LE", "UTF-8", $lines[$i]);
But still no success...
PHP's file
function is not able to read files with the UTF-16LE encoding. It needs to split on the line ending character but PHP does only support single-byte sequences here, UTF-16LE is a multibyte variable-length encoding that is incompatible with the line-splitting procedures encoded into the file
function.
So you are using the wrong function for the job. That simple is the answer. Not iconv
is the problem here, but just using file
.
Instead you need to read in the file into a buffer, get one line after the other out of the buffer and the do the re-encoding to UTF-8.
That starts by learning about the line-separator used in that file. As PHP's file-functions (and string functions as well as the strings itself) are binary based, take the binary sequence in form of a string and the strpos
function to locate it.
Then split line by line out of the buffer (re-fill the buffer again from the file if it runs out of bytes) and then you can use iconv
as outlined in the manual page (or your question, the example code you have is not looking wrong, just take care you use the right parameters so the encodings are correct).
The following code works for me:
Just use the following function fopen_utf8 instead of fopen.
<?php
# http://www.practicalweb.co.uk/blog/2008/05/18/reading-a-unicode-excel-file-in-php/
function fopen_utf8($filename){
$encoding='';
$handle = fopen($filename, 'r');
$bom = fread($handle, 2);
// fclose($handle);
rewind($handle);
if($bom === chr(0xff).chr(0xfe) || $bom === chr(0xfe).chr(0xff)){
// UTF16 Byte Order Mark present
$encoding = 'UTF-16';
} else {
$file_sample = fread($handle, 1000) + 'e'; //read first 1000 bytes
// + e is a workaround for mb_string bug
rewind($handle);
$encoding = mb_detect_encoding($file_sample , 'UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP');
}
if ($encoding){
stream_filter_append($handle, 'convert.iconv.'.$encoding.'/UTF-8');
}
return ($handle);
}
?>
From this website
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With