Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Java automatically decode %2F in URI encoded filenames?

I have a servlet that needs to write out files that have a user-configurable name. I am trying to use URI encoding to properly escape special characters, but the JRE appears to automatically convert encoded forward slashes %2F into path separators.

Example:

File   dir = new File("C:\Documents and Setting\username\temp");
String fn  = "Top 1/2.pdf";
URI    uri = new URI( dir.toURI().toASCIIString() + URLEncoder.encoder( fn, "ASCII" ).toString() );
File   out = new File( uri );

System.out.println( dir.toURI().toASCIIString() );
System.out.println( URLEncoder.encode( fn, "ASCII" ).toString() );
System.out.println( uri.toASCIIString() );
System.out.println( output.toURI().toASCIIString() );

The output is:

file:/C:/Documents%20and%20Settings/username/temp/
Top+1%2F2.pdf   
file:/C:/Documents%20and%20Settings/username/temp/Top+1%2F2.pdf
file:/C:/Documents%20and%20Settings/username/temp/Top+1/2.pdf

After the new File object is instantiated, the %2F sequence is automatically converted to a forward slash and I end up with an incorrect path. Does anybody know the proper way to approach this issue?

The core of the problem seems to be that

uri.equals( new File(uri).toURI() ) == FALSE

when there is a %2F in the URI.

I'm planning to just use the URLEncoded string verbatim rather than trying to use the File(uri) constructor.

like image 676
Lucas Avatar asked May 04 '10 13:05

Lucas


2 Answers

The new File(URI) constructs the file based on the path as obtained by URI#getPath() instead of -what you expected- URI#getRawPath(). This look like a feature "by design".

You have 2 options:

  1. Run URLEncoder#encode() on fn twice (note: encode(), not encoder()).
  2. Use new File(String) instead.
like image 128
BalusC Avatar answered Oct 21 '22 05:10

BalusC


I think that @BalusC has nailed the direct problem in your code. I'd just like to point out some other issuse

The dir.toURI().toASCIIString() and URLEncoder.encoder(fn, "UTF-8").toString() expressions actually do rather different things.

  • The first one, encodes the URI as a string, applying the URI encoding rules according to the URI grammar. So for example, a '/' in the path component will not be encoded but a '/' in the query or fragment components will be encoded as %2F.

  • The second one, encodes the fn String applying the encoding rules without reference to the content of the string.

The File(URI) constructor's mapping from a file URI to a File is system dependent and undocumented. I'm a bit surprised that it decodes the %2F, but it does what it does, and @BalusC explains why. The take-away is that it is potentially problematic to use a mechanism ("file:" URIs) that are explicitly system dependent.

Finally, it is wrong to combine those URI component strings like that. It should be either

URI uri = new URI(
        dir.toURI().toString() +
        URLEncoder.encoder(fn, "UTF-8").toString();

or

URI uri = new URI(
        dir.toURI().toASCIIString() +
        URLEncoder.encoder(fn, "ASCII").toString());
like image 29
Stephen C Avatar answered Oct 21 '22 03:10

Stephen C