I am using jqueryFileTree to show a directory listing on the server with download links to the files in the directory. Recently I've run into an issue with files which contain special characters:
When debugging the php connector of jqueryFileTree, I see it's doing a scandir() of the directory passed via $_GET, and then looping over each file/dir of the directory. Before parsing the filename into the url, the script seems to correctly perform a htmlentities() over the file name. The problem seems to be that this htmlentities($file) call just returns an empty string, which according to the php docs this can be the case when the input string contains an invalid code unit within the given encoding. However i tried passing the charset implicitly by calling:
$file = htmlentities($file,ENT_QUOTES,'UTF-8');
But this also returns an empty string.
If I call: $file = htmlentities($file,ENT_IGNORE,'UTF-8'); The e acute character is just dropped (so tést.pdf becomes tst.pdf)
When debugging my php script with xdebug I can see the source string contains an unknown character (looks like this).
So I'm quite at my wits end here to find the solution for this. Any help would be welcome.
FYI:
My best guess is that the filename itself isn't using UTF-8. Or at least scandir()
isn't picking it up like that.
Maybe mb_detect_encoding()
can shed some light?
var_dump(mb_detect_encoding($filename));
If not, try to guess the encoding (CP1252 or ISO-8859-1 would be my first guess) and convert it to UTF-8, see if the output is valid:
var_dump(mb_convert_encoding($filename, 'UTF-8', 'Windows-1252'));
var_dump(mb_convert_encoding($filename, 'UTF-8', 'ISO-8859-1'));
var_dump(mb_convert_encoding($filename, 'UTF-8', 'ISO-8859-15'));
Or using iconv()
:
var_dump(iconv('WINDOWS-1252', 'UTF-8', $filename));
var_dump(iconv('ISO-8859-1', 'UTF-8', $filename));
var_dump(iconv('ISO-8859-15', 'UTF-8', $filename));
Then when you've figured out which encoding is actually used, your code should look somewhat like this (assuming CP1252):
$filename = htmlentities(mb_convert_encoding($filename, 'UTF-8', 'Windows-1252'), ENT_QUOTES, 'UTF-8');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With