Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java URL encoding of query string parameters

Say I have a URL

http://example.com/query?q= 

and I have a query entered by the user such as:

random word £500 bank $

I want the result to be a properly encoded URL:

http://example.com/query?q=random%20word%20%A3500%20bank%20%24 

What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.

like image 826
user1277546 Avatar asked May 28 '12 14:05

user1277546


People also ask

Should query parameters be URL encoded?

Why do we need to encode? URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded. This means that we need to encode these characters when passing them into a URL.


2 Answers

URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.

String q = "random word £500 bank $"; String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8); 

When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".


Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).

Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.

All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

See also:

  • What every web developer must know about URL encoding
like image 178
BalusC Avatar answered Sep 29 '22 04:09

BalusC


I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.

Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient. The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.

The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.

Sample code:

import org.apache.http.client.utils.URIBuilder;  URIBuilder ub = new URIBuilder("http://example.com/query"); ub.addParameter("q", "random word £500 bank \$"); String url = ub.toString();  // Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24 

Since I'm just linking to other answers I marked this as a community wiki. Feel free to edit.

like image 32
10 revs, 7 users 60% Avatar answered Sep 29 '22 06:09

10 revs, 7 users 60%