I have a servlet that needs to write out files that have a user-configurable name. I am trying to use URI encoding to properly escape special characters, but the JRE appears to automatically convert encoded forward slashes %2F
into path separators.
Example:
File dir = new File("C:\Documents and Setting\username\temp");
String fn = "Top 1/2.pdf";
URI uri = new URI( dir.toURI().toASCIIString() + URLEncoder.encoder( fn, "ASCII" ).toString() );
File out = new File( uri );
System.out.println( dir.toURI().toASCIIString() );
System.out.println( URLEncoder.encode( fn, "ASCII" ).toString() );
System.out.println( uri.toASCIIString() );
System.out.println( output.toURI().toASCIIString() );
The output is:
file:/C:/Documents%20and%20Settings/username/temp/
Top+1%2F2.pdf
file:/C:/Documents%20and%20Settings/username/temp/Top+1%2F2.pdf
file:/C:/Documents%20and%20Settings/username/temp/Top+1/2.pdf
After the new File object is instantiated, the %2F
sequence is automatically converted to a forward slash and I end up with an incorrect path. Does anybody know the proper way to approach this issue?
The core of the problem seems to be that
uri.equals( new File(uri).toURI() ) == FALSE
when there is a %2F
in the URI.
I'm planning to just use the URLEncoded string verbatim rather than trying to use the File(uri)
constructor.
The new File(URI)
constructs the file based on the path as obtained by URI#getPath()
instead of -what you expected- URI#getRawPath()
. This look like a feature "by design".
You have 2 options:
URLEncoder#encode()
on fn
twice (note: encode()
, not encoder()
).new File(String)
instead.I think that @BalusC has nailed the direct problem in your code. I'd just like to point out some other issuse
The dir.toURI().toASCIIString()
and URLEncoder.encoder(fn, "UTF-8").toString()
expressions actually do rather different things.
The first one, encodes the URI as a string, applying the URI encoding rules according to the URI grammar. So for example, a '/' in the path component will not be encoded but a '/' in the query or fragment components will be encoded as %2F.
The second one, encodes the fn
String applying the encoding rules without reference to the content of the string.
The File(URI)
constructor's mapping from a file URI to a File is system dependent and undocumented. I'm a bit surprised that it decodes the %2F
, but it does what it does, and @BalusC explains why. The take-away is that it is potentially problematic to use a mechanism ("file:" URIs) that are explicitly system dependent.
Finally, it is wrong to combine those URI component strings like that. It should be either
URI uri = new URI(
dir.toURI().toString() +
URLEncoder.encoder(fn, "UTF-8").toString();
or
URI uri = new URI(
dir.toURI().toASCIIString() +
URLEncoder.encoder(fn, "ASCII").toString());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With