How to read unicode text-file in PHP?

Question

I have some trouble reading in a text file (saved in Unicode UTF16-LE) in my PHP script.

My PHP script is saved (for some reasons) in UTF-8.

Here is my code:

$lines = file("./somedir/$filename");

for ($i=0; $i < count($lines); $i++) {
    $lines[$i] = iconv("Unicode", "UTF-8", $lines[$i]); // converting to UTF8
}

echo "[0]:".$lines[0]; // outputs CORRECT text (like "This is the first line")
echo "[1]:".$lines[1]; // outputs something like çæ¤ææ¬çææ¸ææ°ã

Any idea please? I checked value of count($lines) and it's perfectly correct... Thanks.

EDIT:
OK so I tried iconv("UTF-16", "UTF-8", $lines[$i]);
I also tried iconv("UTF-16LE", "UTF-8", $lines[$i]);
But still no success...

hakre · Accepted Answer

PHP's file function is not able to read files with the UTF-16LE encoding. It needs to split on the line ending character but PHP does only support single-byte sequences here, UTF-16LE is a multibyte variable-length encoding that is incompatible with the line-splitting procedures encoded into the file function.

So you are using the wrong function for the job. That simple is the answer. Not iconv is the problem here, but just using file.

Instead you need to read in the file into a buffer, get one line after the other out of the buffer and the do the re-encoding to UTF-8.

That starts by learning about the line-separator used in that file. As PHP's file-functions (and string functions as well as the strings itself) are binary based, take the binary sequence in form of a string and the strpos function to locate it.

Then split line by line out of the buffer (re-fill the buffer again from the file if it runs out of bytes) and then you can use iconv as outlined in the manual page (or your question, the example code you have is not looking wrong, just take care you use the right parameters so the encodings are correct).

Dubbo · Answer

The following code works for me:

Just use the following function fopen_utf8 instead of fopen.

<?php
# http://www.practicalweb.co.uk/blog/2008/05/18/reading-a-unicode-excel-file-in-php/
function fopen_utf8($filename){
    $encoding='';
    $handle = fopen($filename, 'r');
    $bom = fread($handle, 2);
//  fclose($handle);
    rewind($handle);

    if($bom === chr(0xff).chr(0xfe)  || $bom === chr(0xfe).chr(0xff)){
            // UTF16 Byte Order Mark present
            $encoding = 'UTF-16';
    } else {
        $file_sample = fread($handle, 1000) + 'e'; //read first 1000 bytes
        // + e is a workaround for mb_string bug
        rewind($handle);

        $encoding = mb_detect_encoding($file_sample , 'UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP');
    }
    if ($encoding){
        stream_filter_append($handle, 'convert.iconv.'.$encoding.'/UTF-8');
    }
    return  ($handle);
} 
?>

From this website

How to read unicode text-file in PHP?

Tags:

php

file-io

unicode

Enriqe

2 Answers

hakre

Dubbo

Recent Activity

Donate For Us

How to read unicode text-file in PHP?

Tags:

php

file-io

unicode

Enriqe

2 Answers

hakre

Dubbo

Related questions

Recent Activity

Donate For Us