Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange url encoding issue

Tags:

java

urlencode

I have a strange issue with urlencoding a plus sign + as a query param for a request against an API. The API's documentation states:

The date has to be in the W3C format, e.g. '2016-10-24T13:33:23+02:00'.

So far so good, so I'm using this code (minimalized) to generate the url, using Spring's UriComponentBuilder:

DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ssX");
ZonedDateTime dateTime = ZonedDateTime.now().minusDays(1);
String formated = dateTime.format(formatter);

UriComponentsBuilder uriComponentsBuilder = UriComponentsBuilder.fromUriString(baseUrl);
uriComponentsBuilder.queryParam("update", formated);
uriComponentsBuilder.build();
String url = uriComponentsBuilder.toUriString();

The unencoded query would look like this:

https://example.com?update=2017-01-05T12:40:44+01

The encoded string results in:

https://example.com?update=2017-01-05T12:40:44%2B01

which is (IMHO) a correctly encoded query String. See the %2B replacing the + in +01 at the end of the query string.

Now, however, when I send the request against the API using the encoded url, I get an error saying the request could not be handled.

If however, I replace the %2B with a + before sending the request, it works:

url.replaceAll("%2B", "+");

From my understanding, the + sign is a replacement for a whitespace. So the url that the server really sees after decoding must be

https://example.com?update=2017-01-05T12:40:44 01
  • Am I right with this assumption?

  • Is there anything I can do, other than contacting the API's owner to make it work using the correctly encoded query, other than strange non standard string replacements?

UPDATE:

According to the specification RFC 3986 (Section 3.4), the + sign in a query param doesn't need to be encoded.

3.4. Query

The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

Berners-Lee, et al. Standards Track [Page 23] RFC 3986 URI Generic Syntax
January 2005

  query       = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data within the query component. Beware that some older, erroneous implementations may not handle such data correctly when it is used as the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators. However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid percent-
encoding those characters.

According to this answer on stackoverflow, spring's UriComponentBuilder uses this specification, but appearently it doesn't really. So a new question would be, how to make UriComponentBuilder follow the specs?

like image 761
Michael Avatar asked Jan 06 '17 12:01

Michael


People also ask

How do I stop URL encoding?

A second way to prevent the browser from URL encoding the input is to use the enctype=”text/plain” tag and to submit the form as a POST.

What does %20 replace in URL?

URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

Why is %20 used in URLs?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.


2 Answers

you can use builder.build().toUriString()

This worked for me

Thanks

like image 197
Sanjay Avatar answered Sep 28 '22 13:09

Sanjay


So it seems like spring's UriComponentBuilder encodes the whole url, setting the encoding flag to false in the build() method has no effect, because the toUriString() method allways encodes the url, as it calls encode() explicitly after build():

/**
 * Build a URI String. This is a shortcut method which combines calls
 * to {@link #build()}, then {@link UriComponents#encode()} and finally
 * {@link UriComponents#toUriString()}.
 * @since 4.1
 * @see UriComponents#toUriString()
 */
public String toUriString() {
    return build(false).encode().toUriString();
}

The solution for me (for now) is encoding what really needs to be encoded manually. Another solution could be (might require encoding too) getting the URI and work with that further on

String url = uriComponentsBuilder.build().toUri().toString(); // returns the unencoded url as a string
like image 24
Michael Avatar answered Sep 28 '22 11:09

Michael