Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : File.toURI().toURL() on Windows file

The system I'm running on is Windows XP, with JRE 1.6.

I do this :

public static void main(String[] args) {
    try {
        System.out.println(new File("C:\\test a.xml").toURI().toURL());
    } catch (Exception e) {
        e.printStackTrace();
    }       
}

and I get this : file:/C:/test%20a.xml

How come the given URL doesn't have two slashes before the C: ? I expected file://C:.... Is it normal behaviour?


EDIT :

From Java source code : java.net.URLStreamHandler.toExternalForm(URL)

    result.append(":");
    if (u.getAuthority() != null && u.getAuthority().length() > 0) {
        result.append("//");
        result.append(u.getAuthority());
    }

It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol?

like image 644
glmxndr Avatar asked Jul 15 '09 13:07

glmxndr


2 Answers

That's an interesting question.

First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.

RFC2396 does not actually require two slashes. According to section 3:

The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows:

<scheme>:<scheme-specific-part>

Having said that, RFC2396 has been superseded by RFC3986, which states

The generic URI syntax consists of a hierarchical sequence of omponents referred to as the scheme, authority, path, query, and fragment.

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  hier-part   = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty

The scheme and path components are required, though the path may be empty (no characters). When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.

So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.

However, that RFC didn't come around until 2005, and Java references RFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.

like image 95
Powerlord Avatar answered Sep 23 '22 01:09

Powerlord


To answer why you can have both:

file:/path/file
file:///path/file
file://localhost/path/file

RFC3986 (3.2.2. Host) states:

"If the URI scheme defines a default for host, then that default applies when the host subcomponent is undefined or when the registered name is empty (zero length). For example, the "file" URI scheme is defined so that no authority, an empty host, and "localhost" all mean the end-user's machine, whereas the "http" scheme considers a missing authority or empty host invalid."

So the "file" scheme translates file:///path/file to have a context of the end-user's machine even though the authority is an empty host.

like image 33
rbeede Avatar answered Sep 23 '22 01:09

rbeede