Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Java's URI.resolve incompatible with RFC 3986 when the relative URI contains an empty path?

I believe the definition and implementation of Java's URI.resolve method is incompatible with RFC 3986 section 5.2.2. I understand that the Java API defines how that method works, and if it were changed now it would break existing apps, but my question is this: Can anyone confirm my understanding that this method is incompatible with RFC 3986?

I'm using the example from this question: java.net.URI resolve against only query string, which I will copy here:


I'm trying to build URI's using the JDK java.net.URI. I want to append to an absolute URI object, a query (in String). In example:

URI base = new URI("http://example.com/something/more/long");
String queryString = "query=http://local:282/rand&action=aaaa";
URI query = new URI(null, null, null, queryString, null);
URI result = base.resolve(query);

Theory (or what I think) is that resolve should return:

http://example.com/something/more/long?query=http://local:282/rand&action=aaaa

But what I got is:

http://example.com/something/more/?query=http://local:282/rand&action=aaaa

My understanding of RFC 3986 section 5.2.2 is that if the path of the relative URI is empty, then the entire path of the base URI is to be used:

        if (R.path == "") then
           T.path = Base.path;
           if defined(R.query) then
              T.query = R.query;
           else
              T.query = Base.query;
           endif;

and only if a path is specified is the relative path to be merged against the base path:

        else
           if (R.path starts-with "/") then
              T.path = remove_dot_segments(R.path);
           else
              T.path = merge(Base.path, R.path);
              T.path = remove_dot_segments(T.path);
           endif;
           T.query = R.query;
        endif;

but the Java implementation always does the merge, even if the path is empty:

    String cp = (child.path == null) ? "" : child.path;
    if ((cp.length() > 0) && (cp.charAt(0) == '/')) {
      // 5.2 (5): Child path is absolute
      ru.path = child.path;
    } else {
      // 5.2 (6): Resolve relative path
      ru.path = resolvePath(base.path, cp, base.isAbsolute());
    }

If my reading is correct, to get this behaviour from the RFC pseudocode, you could put a dot as the path in the relative URI, before the query string, which from my experience using relative URIs as links in web pages is what I would expect:

transform(Base="http://example.com/something/more/long", R=".?query")
    => T="http://example.com/something/more/?query"

But I would expect, in a web page, that a link on the page "http://example.com/something/more/long" to "?query" would go to "http://example.com/something/more/long?query", not "http://example.com/something/more/?query" - in other words, consistent with the RFC, but not with the Java implementation.

Is my reading of the RFC correct, and the Java method inconsistent with it, or am I missing something?

like image 477
Martin Pain Avatar asked Mar 05 '14 16:03

Martin Pain


2 Answers

Yes, I agree that the URI.resolve(URI) method is incompatible with RFC 3986. The original question, on its own, presents a fantastic amount of research that contributes to this conclusion. First, let's clear up any confusion.

As Raedwald explained (in a now deleted answer), there is a distinction between base paths that end or do not end with /:

  • fizz relative to /foo/bar is: /foo/fizz
  • fizz relative to /foo/bar/ is: /foo/bar/fizz

While correct, it's not a complete answer because the original question is not asking about a path (i.e. "fizz", above). Instead, the question is concerned with the separate query component of the relative URI reference. The URI class constructor used in the example code accepts five distinct String arguments, and all but the queryString argument were passed as null. (Note that Java accepts a null String as the path parameter and this logically results in an "empty" path component because "the path component is never undefined" though it "may be empty (zero length)".) This will be important later.

In an earlier comment, Sajan Chandran pointed out that the java.net.URI class is documented to implement RFC 2396 and not the subject of the question, RFC 3986. The former was obsoleted by the latter in 2005. That the URI class Javadoc does not mention the newer RFC could be interpreted as more evidence of its incompatibility. Let's pile on some more:

  • JDK-6791060 is an open issue that suggests this class "should be updated for RFC 3986". A comment there warns that "RFC3986 is not completely backwards compatible with 2396".

  • Previous attempts were made to update parts of the URI class to be compliant with RFC 3986, such as JDK-6348622, but were then rolled back for breaking backwards compatibility. (Also see this discussion on the JDK mailing list.)

  • Although the path "merge" logic sounds similar, as noted by SubOptimal, the pseudocode specified in the newer RFC does not match the actual implementation. In the pseudocode, when the relative URI's path is empty, then the resulting target path is copied as-is from the base URI. The "merge" logic is not executed under those conditions. Contrary to that specification, Java's URI implementation trims the base path after the last / character, as observed in the question.

There are alternatives to the URI class, if you want RFC 3986 behavior. Java EE 6 implementations provide javax.ws.rs.core.UriBuilder, which (in Jersey 1.18) seems to behave as you expected (see below). It at least claims awareness of the RFC as far as encoding different URI components is concerned.

Outside of J2EE, Spring 3.0 introduced UriUtils, specifically documented for "encoding and decoding based on RFC 3986". Spring 3.1 deprecated some of that functionality and introduced the UriComponentsBuilder, but it does not document adherence to any specific RFC, unfortunately.


Test program, demonstrating different behaviors:

import java.net.*;
import java.util.*;
import java.util.function.*;
import javax.ws.rs.core.UriBuilder; // using Jersey 1.18

public class StackOverflow22203111 {

    private URI withResolveURI(URI base, String targetQuery) {
        URI reference = queryOnlyURI(targetQuery);
        return base.resolve(reference);
    }
 
    private URI withUriBuilderReplaceQuery(URI base, String targetQuery) {
        UriBuilder builder = UriBuilder.fromUri(base);
        return builder.replaceQuery(targetQuery).build();
    }

    private URI withUriBuilderMergeURI(URI base, String targetQuery) {
        URI reference = queryOnlyURI(targetQuery);
        UriBuilder builder = UriBuilder.fromUri(base);
        return builder.uri(reference).build();
    }

    public static void main(String... args) throws Exception {

        final URI base = new URI("http://example.com/something/more/long");
        final String queryString = "query=http://local:282/rand&action=aaaa";
        final String expected =
            "http://example.com/something/more/long?query=http://local:282/rand&action=aaaa";

        StackOverflow22203111 test = new StackOverflow22203111();
        Map<String, BiFunction<URI, String, URI>> strategies = new LinkedHashMap<>();
        strategies.put("URI.resolve(URI)", test::withResolveURI);
        strategies.put("UriBuilder.replaceQuery(String)", test::withUriBuilderReplaceQuery);
        strategies.put("UriBuilder.uri(URI)", test::withUriBuilderMergeURI);

        strategies.forEach((name, method) -> {
            System.out.println(name);
            URI result = method.apply(base, queryString);
            if (expected.equals(result.toString())) {
                System.out.println("   MATCHES: " + result);
            }
            else {
                System.out.println("  EXPECTED: " + expected);
                System.out.println("   but WAS: " + result);
            }
        });
    }

    private URI queryOnlyURI(String queryString)
    {
        try {
            String scheme = null;
            String authority = null;
            String path = null;
            String fragment = null;
            return new URI(scheme, authority, path, queryString, fragment);
        }
        catch (URISyntaxException syntaxError) {
            throw new IllegalStateException("unexpected", syntaxError);
        }
    }
}

Outputs:

URI.resolve(URI)
  EXPECTED: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
   but WAS: http://example.com/something/more/?query=http://local:282/rand&action=aaaa
UriBuilder.replaceQuery(String)
   MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
UriBuilder.uri(URI)
   MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
like image 121
William Price Avatar answered Oct 20 '22 00:10

William Price


If you want better1 behavior from URI.resolve() and do not want to include another large dependency2 in your program, then I found the following code to work well within my requirements:

public URI resolve(URI base, URI relative) {
    if (Strings.isNullOrEmpty(base.getPath()))
        base = new URI(base.getScheme(), base.getAuthority(), "/",
            base.getQuery(), base.getFragment());
    if (Strings.isNullOrEmpty(uri.getPath()))
        uri = new URI(uri.getScheme(), uri.getAuthority(), base.getPath(),
            uri.getQuery(), uri.getFragment());
    return base.resolve(uri);
}

The only non-JDK thing there is Strings from Guava, for readability - replace with your own 1-line-method if you don't have Guava.

Footnotes:

  1. I cannot claim that the simple code sample here is RFC3986 compliant.
  2. Such as Spring, javax.ws or - as mentioned in this answer - Apache HTTPClient.
like image 40
Guss Avatar answered Oct 19 '22 22:10

Guss