Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to open file in PHP that has unicode characters in its name?

For example I have a filename like this - проба.xml and I am unable to open it from PHP script.

If I setup php script to be in utf-8 than all the text in script is utf-8 thus when I pass this to file_get_contents:

$fname = "проба.xml";
file_get_contents($fname);

I get error that file does not exist. The reason for this is that in Windows (XP) all file names with non-latin characters are unicode (UTF-16). OK so I tried this:

$fname = "проба.xml";
$res = mb_convert_encoding($fname,'UTF-8','UTF-16');
file_get_contents($res);

But the error persists since file_get_contents can not accept unicode strings...

Any suggestions?

like image 238
Darko Miletic Avatar asked Jun 10 '09 19:06

Darko Miletic


People also ask

Does PHP support Unicode?

PHP does not offer native Unicode support. PHP only supports a 256-character set. However, PHP provides the UTF-8 functions utf8_encode() and utf8_decode() to provide some basic Unicode functionality. See the PHP manual for strings for more details about PHP and Unicode.

What type of file format is Unicode?

Unicode is a universal encoding scheme for written characters and text that enables the exchange of data internationally. Two transformation formats, UTF_16 and UCS_2, of Unicode are supported with DDS. A Unicode field in a display file can contain UCS-2 or UTF-16 data.

What is a Unicode character string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.


1 Answers

UPDATE (July 13 '17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 '15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

$fname = "你好.xml";
file_get_contents($fname);
like image 94
Henry Avatar answered Sep 19 '22 09:09

Henry