Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode multibyte filenames in PHP and decode them in javascript?

Here is some example of file names:

漢語.jpg (Chinese)
Федерация.jpg (Russian)
AbÇöişÜĞ.jpg (Turkish, ISO-8859-9)
...

I have tried rawurlencode(mb_convert_encoding($file, "UTF-8", mb_detect_encoding($file))) but this is not working, all Chinese and Russian characters printed as %3F(regular question mark) and all Turkish characters are removed.

I am testing on Windows, PHP 5.3.

Only solution i found is to enter encoding explicitly: rawurlencode(mb_convert_encoding($file, "UTF-8", "ISO-8859-9")) This works only for Turkish characters.

By the way, mb_detect_encoding($file) always returns "UTF-8" for above files.

EDIT:
After i ran the following code i think mb_convert_encoding() cant solve my problem:

$iterator = new RecursiveIteratorIterator(new RecursiveDirectoryIterator("mp", FilesystemIterator::UNIX_PATHS));
$iterator = new RegexIterator($iterator, '/^.+\.(gif|jpg|jpeg|png)$/i', RegexIterator::GET_MATCH);

foreach ($iterator as $file)
{
    foreach (mb_list_encodings() as $encoding)
        var_dump(rawurlencode(mb_convert_encoding($file[0], "UTF-8", $encoding)) . " : " . $encoding);
}

I guess this is something about encoding but i don't know how to do.

like image 485
ahk Avatar asked Apr 05 '12 15:04

ahk


1 Answers

So, the main thing is that most transports (network, files, rpc) will want characters that are a byte at most. The URL encoding (%FF) expects input data to be a byte per character also.

So what you need to do is utilize UTF8. It will take multi-byte characters and make a string of 1 byte characters out of it. From this string, you can do things normally ascii could do.

What you want to do is explicitly set encoding for php:

mb_internal_encoding("UTF-8");

Now all of your internal strings and filenames etc. will be UTF-8 (single byte) encoded. From here you can echo out the filename AS-IS and it will hit the transport as encoded data. From javascript, all you have to do is send a request using AJAX, and it will all be neatly decoded for you automatically, ready to use in the browser :) Just make sure you set your content-type in your html file, as this will be used as your default JS encoding.

<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ />
like image 81
Colin Godsey Avatar answered Oct 20 '22 00:10

Colin Godsey