Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL to URI encoding changes a "%3D" to "%253D"

I'm having trouble encoding a URL to a URI:

mUrl = "A string url that needs to be encoded for use in a new HttpGet()";
URL url = new URL(mUrl);
URI uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), 
    url.getQuery(), null);

This does not do what I expect for the following URL:

Passing in the String:

http://m.bloomingdales.com/img?url=http%3A%2F%2Fimages.bloomingdales.com%2Fis%2Fimage%2FBLM%2Fproducts%2F3%2Foptimized%2F1140443_fpx.tif%3Fwid%3D52%26qlt%3D90%2C0%26layer%3Dcomp%26op_sharpen%3D0%26resMode%3Dsharp2%26op_usm%3D0.7%2C1.0%2C0.5%2C0%26fmt%3Djpeg&ttl=30d

Comes out as:

http://m.bloomingdales.com/img?url=http%253A%252F%252Fimages.bloomingdales.com%252Fis%252Fimage%252FBLM%252Fproducts%252F3%252Foptimized%252F1140443_fpx.tif%253Fwid%253D52%2526qlt%253D90%252C0%2526layer%253Dcomp%2526op_sharpen%253D0%2526resMode%253Dsharp2%2526op_usm%253D0.7%252C1.0%252C0.5%252C0%2526fmt%253Djpeg&ttl=30d

Which is broken. For example, the %3D is turned into %253D It seems to be doing something mysterious to the %'s already in the string.

What's going on and what am I doing wrong here?

like image 984
cottonBallPaws Avatar asked Feb 01 '11 01:02

cottonBallPaws


People also ask

What type of encoding is URL encoding?

Percent-encoding is a mechanism to encode 8-bit characters that have specific meaning in the context of URLs. It is sometimes called URL encoding. The encoding consists of substitution: A '%' followed by the hexadecimal representation of the ASCII value of the replace character.

Why is there a %20 in my URL?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.

How is encoded in URL?

The encoding notation replaces the desired character with three characters: a percent sign and two hexadecimal digits that correspond to the position of the character in the ASCII character set.


3 Answers

You are first putting the (already-escaped) string into the URL class. That doesn't escape anything. Then you are pulling out sections of the URL, which returns them without any further processing (so -- they are still escaped since they were escaped when you put them in). Finally, you are putting the sections into the URI class, using the multi-argument constructor. This constructor is specified as encoding the URI components using percentages.

Therefore, it is in this final step that, for example, ":" becomes "%3A" (good) and "%3A" becomes "%253A" (bad). Since you are putting in URLs which are already-encoded*, you don't want to encode them again.

Therefore, the single-argument constructor of URI is your friend. It doesn't escape anything, and requires that you pass a pre-escaped string. Hence, you don't need URL at all:

mUrl = "A string url is already percent-encoded for use in a new HttpGet()";
URI uri = new URI(mUrl);

*The only problem is if your URLs are sometimes not percent-encoded, and sometimes they are. Then you have a bigger problem. You need to decide whether your program is starting out with a URL which is always encoded, or one which needs to be encoded.

Note that there is no such thing as a full URL which is not percent-encoded. For example, you can't take the full URL "http://example.com/bob&co" and somehow turn it into the properly-encoded URL "http://example.com/bob%26co" -- how can you tell the difference between the syntax (which shouldn't be escaped) and the characters (which should)? This is why the single-argument form of URI requires that strings are already-escaped. If you have unescaped strings, you need to percent-encode them before inserting them into the full URL syntax, and that is what the multi-argument constructor of URI helps you do.

Edit: I missed the fact that the original code discards the fragment. If you want to remove the fragment (or any other part) of the URL, you can construct the URI as above, then pull all the parts out as required (they will be decoded into regular strings), then pass them back into the URI multi-argument constructor (where they will be re-encoded as URI components):

uri = new URI(uri.getScheme(), uri.getUserInfo(), uri.getHost(), uri.getPort(),
              uri.getPath(), uri.getQuery(), null)  // Remove fragment
like image 144
mgiuca Avatar answered Oct 06 '22 04:10

mgiuca


%3d means-> = (Equal)

And

%253D --> = (Equal) decimal 6hex (byte) 3D

%253D hex indicator for CGI: %3D

like image 23
Sarat Patel Avatar answered Oct 06 '22 06:10

Sarat Patel


The URL class didn't decode the %-sequences when it parsed the URL, but the URI class is encoding them (again). Use URI to parse the URL string.

Javadocs:

http://download.oracle.com/javase/6/docs/api/java/net/URL.html

The URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396. It is the responsibility of the caller to encode any fields, which need to be escaped prior to calling URL, and also to decode any escaped fields, that are returned from URL. Furthermore, because URL has no knowledge of URL escaping, it does not recognise equivalence between the encoded or decoded form of the same URL. For example, the two URLs:

http://foo.com/hello world/ and http://foo.com/hello%20world

would be considered not equal to each other. Note, the URI class does perform escaping of its component fields in certain circumstances.

The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL().

like image 24
Bert F Avatar answered Oct 06 '22 05:10

Bert F