Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any way to make Android's default browser recognize non-ASCII filenames in "Content-Disposition: attachment" downloads?

First of all, I'm pretty sure this is not a duplicate because I've been researching this topic for quite some time, both on StackOverflow and elsewhere. Similar questions have been asked, but none were answered satisfactorily.

Related (but not identical) questions from the past:

  • Android Chrome browser unnecessarily renames names & types of downloaded files
  • How to encode the filename parameter of Content-Disposition header in HTTP?

I'm also fully aware of mod_rewrite tricks that make it completely unnecessary to juggle filenames in HTTP headers. But let's suppose that this is not an option.


Most modern browsers (IE9+, Firefox, Chrome) support RFC2231/5987 when downloading files with non-ASCII characters in their names. In those cases, the following PHP code works like a charm:

header("Content-Disposition: attachment; " .
       "filename*=UTF-8''" . rawurlencode($filename));

IE <= 8 doesn't understand RFC2231/5987, but the following code works most of the time. Since every browser has tried to emulate IE to some extent, this also works in many other browsers, such as Firefox.

header("Content-Disposition: attachment; " .
       'filename="' . rawurlencode($filename) . '"');

Meanwhile, Chrome < 11 and Safari < 6 seem to prefer the following, despite the fact that it places non-ASCII characters directly in the header.

header("Content-Disposition: attachment; filename=" . $filename);

So far so good.


But everything falls apart when it comes to Android's default browser app. (So far, I've tested this in Gingerbread, Ice Cream Sandwich and Jelly Bean.)

If you give it the standard RFC2231/5987 treatment, the default browser completely ignores it and tries to guess the filename from the last part of the URL.

If you give it the usual non-standard (IE <= 8) treatment, either the default browser tries to interpret the filename as ISO-8859-1, leading to an unintelligible jumble of characters, or it silently discards all non-ASCII characters. The exact behavior differs between versions, but in any case it is clear that Android's default browser was not designed to support the rawurlencode() format, either.

The same thing happens if you put the raw filename in the header.

This is usually not an issue with third-party browsers, such as Firefox for Android, Dolphin Browser, and Boat Browser. The default browser app is the only one that consistently fails to understand UTF-8 filenames.


Perhaps this was finally fixed in a recent version of Android, or perhaps it will be fixed in the next version. But that's not my question. I need this to work in existing devices, and there are still millions of Gingerbread and ICS devices out there.

I've read the bug reports, I've read the complaints, I've read pretty much everything there is to read about this problem. So far I have been unable to find any encoding scheme that actually works.

If anyone knows how to encode a non-ASCII filename** (e.g. файла파일ファイル名.jpg) in a Content-Disposition **header and have the Android default browser recognize it, please share it! I don't care how hacky or non-standard it is. I don't care if it needs to be customized for each version of Android.

Update

Unfortunately, so far I have not received any answer that actually solves the problem mentioned above. So the bounty expires unclaimed. Please don't answer unless you actually know how to encode non-European, mixed-language filenames in a way that is recognized by Android Browser prior to ICS, or if you have solid evidence that this is impossible.

like image 310
kijin Avatar asked Apr 01 '14 13:04

kijin


1 Answers

URLUtil.java is responsible for guessFileName which calls parseContentDisposition which uses this regular expression "attachment;\\s*filename\\s*=\\s*(\"?)([^\"]*)\\1\\s*$".

to get the filename of the file based on the Content-Disposition header.

The source code below which attempts to replicate the parseContentDisposition functionality, works correctly when I tested it. E.g It returns файла파일ファイル名.jpg.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld{

     public static void main(String []args){
     String contentDisposition = "Content-Disposition: attachment; " + " filename=" +"\"файла파일ファイル名.jpg\"";     
     Pattern CONTENT_DISPOSITION_PATTERN = Pattern.compile("attachment;\\s*filename\\s*=\\s*(\"?)([^\"]*)\\1\\s*$",Pattern.CASE_INSENSITIVE);
        try {
            Matcher m = CONTENT_DISPOSITION_PATTERN.matcher(contentDisposition);
            if (m.find()) {
                System.out.println("Result: " + m.group(2));
            }
        } catch (IllegalStateException ex) {
             // This function is defined as returning null when it can't parse the header
        }

     }
}
like image 179
Appleman1234 Avatar answered Oct 19 '22 22:10

Appleman1234