Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.net.URI and percent in query parameter value

System.out.println(
    new URI("http", "example.com", "/servlet", "a=x%20y", null));

The result is http://example.com/servlet?a=x%2520y, where the query parameter value differs from the supplied one. Strange, but this does follow the Javadoc:

"The percent character ('%') is always quoted by these constructors."

We can pass the decoded string, a=x y and then we get a reasonable(?) result a=x%20y.

But what if the query parameter value contains an "&" character? This happens for example if the value is an URL itself with query parameters. Look at this (wrong) query string: a=b&c. The ampersand must be escaped here (a=b%26c), otherwise this can be considered as a query parameter a=b and some garbage (c). If I pass this to an URI constructor, it encodes it, and returns a wrong URL: ...?a=b%2526c

This issue seems to render java.util.URI useless. Am I missing something here?

Summary of answers

java.net.URI does know about the existence of the query part of an URI, but it does not understand the internals of the query part, which can differ for each scheme. For example java.net.URI does not understand the internal structure of the HTTP query part. This would not be a problem, if java.net.URI considered query as an opaque string, and did not alter it. But it tries to apply some generic percent-encoding algorithm, which breaks HTTP URLs.

Therefore I cannot use the URI class to reliably assemble an URL from its parts, despite there are constructors for it. I would also mention that as of Java 7, the implementation of the relativize operation is quite limited, only works if one URL is the prefix of another one. These two functionality (and its leaner interface for these purposes) were the reason why I was interested in java.net.URI, but neither of them works for me.

At the end I used java.net.URL for parsing, and wrote code to assemble an URL from parts and to relativize two URLs. I also checked the Apache HttpClient URIBuilder class, and although it does understand the internals of an HTTP query string, but as of 4.3, it has the same problem with encoding like java.net.URI when dealing with the query part as a whole.

like image 425
Hontvári Levente Avatar asked Nov 11 '13 22:11

Hontvári Levente


People also ask

Can URI have query parameters?

URI parameter (Path Param) is basically used to identify a specific resource or resources whereas Query Parameter is used to sort/filter those resources. Let's consider an example where you want identify the employee on the basis of employeeID, and in that case, you will be using the URI param.

What are URI parameters and query parameters?

A URI parameter identifies a specific resource whereas a Query Parameter is used to sort or filter resources.

How do you pass a value in a query parameter?

To pass in parameter values, simply append them to the query string at the end of the base URL. In the above example, the view parameter script name is viewParameter1.

What do you call a parameter added to the URI?

URL parameters (known also as “query strings” or “URL query parameters”) are elements inserted in your URLs to help you filter and organize content or track information on your website.


1 Answers

The query string

a=b&c

is not wrong in a URI. The RFC on URI Generic Syntax states

The query component is a string of information to be interpreted by the resource.

  query         = *uric

Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$" are reserved.

The character & in the query string is very much valid (uric represents reserved, mark, and alphanumeric characters). The RFC also states

Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.

Because the & is valid but reserved, it is up to the user to determine if it is meant to be encoded or not.

What you call a query parameter is not a feature of a URI and therefore the URI class has no reason to (and shouldn't) support it.

Related:

  • Which characters make a URL invalid?
like image 180
Sotirios Delimanolis Avatar answered Nov 02 '22 05:11

Sotirios Delimanolis