Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpClient problem with URLs which include curly braces

I am using HttpClient for my android application. At some point, I have to fetch data from remote locations. Below is the snippet how I made use of HttpClient to get the response.

String url_s = "https://mydomain.com/abc/{5D/{B0blhahblah-blah}I1.jpg"; //my url string
DefaultHttpClient httpClient = new DefaultHttpClient();
response = httpClient.execute(new HttpGet(url_s));

It works absolutely fine in most cases but not when there is some curly braces in my url which is String basically. The stack trace shows me the index of curly braces saying Invalid character. So I tried to create URI from encoded URL.

URL url = new URL(url_s);
URI uri = url.toURI();
response = httpClient.execute(new HttpGet(uri));

After doing so, i didn't get the result from remote location at all. I worked around the problem and fixed it by replacing the curly brace

  • "{" with "%7B"
  • "}" with "%7D"

But I am not totally satisfy with my solution. Are there any better solutions? Anything neat and not hard-coded like mine?

like image 783
PH7 Avatar asked Jul 19 '11 01:07

PH7


2 Answers

The strict answer is that you should never have curly braces in your URL

A full description of valid URL's can be found in RFC1738

The pertinent part for this answer is as follows

Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

In order to bypass the problem you have been experiencing you must encode your url.

The problem you experienced with the "host may not be null" error will happen when the entire url is being encoded including the https://mydomain.com/ part so it gets confused. You only want to encode the last part of the URL called the path.

The solution is to use the Uri.Builder class to build your URI from the individual parts which should encode the path in the process

You will find a detailed description in the Android SDK Uri.Builder reference documentation

Some trivial examples using your values are:

Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon();
b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg");
Uri u = b.build();

Or you can use chaining:

    Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build();
like image 79
Moog Avatar answered Sep 18 '22 13:09

Moog


Except RFC1738 has been obsolete for over a decade, has been superseded by rfc3986 and there is no indication in:

https://www.rfc-editor.org/rfc/rfc3986

That curly braces are unsafe (In fact, the RFC does not contain a single curly brace character anywhere). Furthermore, I've tried URI's in browsers that contain curly braces, and they work fine.

Also note the OP is using a class called URI - which should definitely be following 3986, at the very least, if not 3987.

However, oddly, IRIs defined in:

https://www.rfc-editor.org/rfc/rfc3987

Have the note that:

Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion
SHOULD fail. Please note that the number sign ("#"), the percent
sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted.

In other words, it looks like the RFCs themselves have some issues.

like image 32
user2077221 Avatar answered Sep 19 '22 13:09

user2077221