Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading a file with the right encoding

Tags:

php

encoding

I have a txt file where, if I open with a standart text editor as notepad or scite, I can read strings like these :

Artist1 – Title 1
Artist2 – Title 2

Than I open it with my PHP script and I read the lines :

$tracklistFile_name=time().rand(1, 1000).".".pathinfo($_FILES['tracklistFile']['name'], PATHINFO_EXTENSION);
if(((pathinfo($tracklistFile_name, PATHINFO_EXTENSION)=='txt')) && (move_uploaded_file($_FILES['tracklistFile']['tmp_name'], 'import/'.$tracklistFile_name))) {
    $fileArray=file('import/'.$tracklistFile_name, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
    $fileArray=array_values(array_filter($fileArray, "trim"));

    for($i=0; $i<sizeof($fileArray); $i++) {
        echo $fileArray[$i]."<br />";
    }
}

and...WOW... i get this result :

Artist1 � Title1 
Artist2 � Title2 

??? What are those symbol? I think the encoding fail. The symbol are so wrong that I can't insert them on database, neither with mysql_real_escape_string(). In fact I get this error when I try to insert them :

Incorrect string value: '\x96 Titl...' for column 'atl' at row 1

How can I resolve this problem? Suggestions?

EDIT

Tried to add utf8_encode() before insert/add these strings : now the Insert don't fail, but the result is :

Artist1  Title1 
Artist2  Title2

So i've lost information. Why?

like image 907
markzzz Avatar asked Apr 26 '11 20:04

markzzz


People also ask

What is the encoding of a file?

An encoding converts a sequence of code points to a sequence of bytes. An encoding is typically used when writing text to a file. To read it back in we have to know how it was encoded and decode it back into memory. A text encoding is basically a file format for text files.


1 Answers

You should read Joel Spolsky's article on UTF-8 and encoding.

Your problem almost definitely stems from an encoding mismatch, your first job is to figure out where this mismatch is occurring, your problem could be in a bunch of different places.

1) your php code could be reading input using an incorrect encoding (if you are trying to read in iso-8859, but the source file is encoded some other way)

2) your php code could be writing output using an incorrect encoding

3) whatever you are using to read the output (your browser) could be set to a different encoding than the bytes you are writing.

once you figure out which of the 3 places is causing your problem, you can figure out how to fix it by understanding what your source encoding is, and how to read/write using that source encoding instead of another encoding (which your system has probably set as the default).

EDIT: not knowing php well, it looks like you could use mb_detect_encoding and possibly also mb-convert-encoding.

like image 200
Paul Sanwald Avatar answered Sep 23 '22 09:09

Paul Sanwald