Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

While downloading filenames from non English languages are not getting displayed on the downloaded file correctly

Tags:

php

download

When I am trying to download a file whose name has characters from languages like Chinese Japanese etc.... non ascii... the downloaded file name is garbled. How to rectify it.

I have tried to put charset=UTF-8 in the Content-type header property, but no success. Code below.

header("Cache-Control: ");// leave blank to avoid IE errors

header("Pragma: ");// leave blank to avoid IE errors

header("Content-type: application/octet-stream");

header("Content-Disposition: attachment; filename=\"".$instance_name."\"");

header("Content-length:".(string)(filesize($fileString)));

sleep(1);

fpassthru($fdl);
like image 363
pks83 Avatar asked Apr 05 '10 12:04

pks83


1 Answers

Unfortunately there is currently not a single solution that works with all browsers. There are at least three "more obvious" approaches to the problem.

a) Content-type: application/octet-stream; charset=utf-8 + filename=<utf8 byte sequence>
e.g. filename=Москва.txt
This is a violation of standards but firefox shows the name correctly. IE doesn't.

b) Content-type: application/octet-stream; charset=utf-8 + filename=<urlencode(utf8 byte sequence)>
e.g. filename=%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D0%B0.txt
This works with IE but not with firefox.

c) providing the name as specified in rfc 2231
e.g filename*=UTF-8''%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D0%B0.txt
Again firefox supports this, IE doesn't.

for a more comprehensive comparison see http://greenbytes.de/tech/tc2231/


edit: When I said that there is no single solution, I meant via header('...'). But there is something of a work around.
When there is no usable filename=xyz header browsers use the basename of the path part of the url. I.e. for <a href="test.php/lala.txt"> both firefox and IE suggest lalala.txt as the filename.
You can append extra path components after the actual path to your php script (when using apache's httpd see http://httpd.apache.org/docs/2.1/mod/core.html#acceptpathinfo).
E.g. if you have a file test.php in your document root and request it as http://localhost/test.php/x/y/z the variable $_SERVER['PATH_INFO'] will contain /x/y/z.
Now, if you put a link like

<a
  href="/test.php/download/moskwa/&#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;"
>
  &#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;
</a>

in your document you can fetch the download/moskwa/... part and initiate the download of the file. Without sending any filename=... information both firefox and IE suggest the "right" name.
You can even combine it with sending the name according to rfc 2231. That's why I also put moskwa into the link. That would be the id the script uses to find the file it is supposed to send. The IE ignores the filename*=... information and still uses the basename part of the url to suggest a name. That means for firefox (and any other client that supports rfc 2231) the part after the id is meaningless* but for the IE (and other clients not supporting rfc 2231) it would be used for the name suggestion.
self-contained example:

<?php // test.php
$files = array(
  'moskwa'=>array(
    'htmlentities'=>'&#x41c;&#x43e;&#x441;&#x43a;&#x432;&#x430;',
    'content'=>'55° 45′ N, 37° 37′ O'
  ),
  'athen'=>array(
    'htmlentities'=>'&#x391;&#x3b8;&#x3ae;&#x3bd;&#x3b1;',
    'content'=>'37° 59′ N, 23° 44′ O'
  )
);


$fileid = null;
if ( isset($_SERVER['PATH_INFO']) && preg_match('!^/download/([^/]+)!', $_SERVER['PATH_INFO'], $m) ) {
  $fileid = $m[1];
}

if ( is_null($fileid) ) {
  foreach($files as $fileid=>$bar) {
    printf(
      '<a href="./test.php/download/%s/%s.txt">%s</a><br />', 
      $fileid, $bar['htmlentities'], $bar['htmlentities']
    );
  }  
}
else if ( !isset($files[$fileid]) ) {
  echo 'no such file';
}
else {
  $f = $files[$fileid];
  $utf8name = mb_convert_encoding($f['htmlentities'], 'utf-8', 'HTML-ENTITIES');
  $utf8name = urlencode($utf8name);

  header("Content-type: text/plain");
  header("Content-Disposition: attachment; filename*=UTF-8''$utf8name.txt");
  header("Content-length: " . strlen($f['content']));
  echo $f['content'];
}

*) That's a bit like here on Stack Overflow. The link for this question is shown as

http://stackoverflow.com/questions/2578349/while-downloading-filenames-from-non-english-languages-are-not-getting-displayed

but it also works with

http://stackoverflow.com/questions/2578349/mary-had-a-little-lamb

the important part is the id 2578349

like image 98
VolkerK Avatar answered Nov 11 '22 03:11

VolkerK