This ensures that characters that would otherwise have special meaning don't interfere. In your case %20 is immediately recognisable as a whitespace character - while not really having any meaning in a URI it is encoded in order to avoid breaking the string into multiple "parts".
A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.
Note that Java's URLEncoder class encodes space character( " " ) into a + sign. This is contrary to other languages like Javascript that encode space character into %20 . Check out this StackOverflow discussion to learn what it means to encode the space character into %20 or + sign.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
This behaves as expected. The URLEncoder
implements the HTML Specifications for how to encode URLs in HTML forms.
From the javadocs:
This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.
and from the HTML Specification:
application/x-www-form-urlencoded
Forms submitted with this content type must be encoded as follows:
- Control names and values are escaped. Space characters are replaced by `+'
You will have to replace it, e.g.:
System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replace("+", "%20"));
A space is encoded to %20
in URLs, and to +
in forms submitted data (content type application/x-www-form-urlencoded). You need the former.
Using Guava:
dependencies {
compile 'com.google.guava:guava:23.0'
// or, for Android:
compile 'com.google.guava:guava:23.0-android'
}
You can use UrlEscapers:
String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);
Don't use String.replace, this would only encode the space. Use a library instead.
This class perform application/x-www-form-urlencoded
-type encoding rather than percent encoding, therefore replacing with
+
is a correct behaviour.
From javadoc:
When encoding a String, the following rules apply:
- The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
- The special characters ".", "-", "*", and "_" remain the same.
- The space character " " is converted into a plus sign "+".
- All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
Encode Query params
org.apache.commons.httpclient.util.URIUtil
URIUtil.encodeQuery(input);
OR if you want to escape chars within URI
public static String escapeURIPathParam(String input) {
StringBuilder resultStr = new StringBuilder();
for (char ch : input.toCharArray()) {
if (isUnsafe(ch)) {
resultStr.append('%');
resultStr.append(toHex(ch / 16));
resultStr.append(toHex(ch % 16));
} else{
resultStr.append(ch);
}
}
return resultStr.toString();
}
private static char toHex(int ch) {
return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
}
private static boolean isUnsafe(char ch) {
if (ch > 128 || ch < 0)
return true;
return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
}
Hello+World
is how a browser will encode form data (application/x-www-form-urlencoded
) for a GET
request and this is the generally accepted form for the query part of a URI.
http://host/path/?message=Hello+World
If you sent this request to a Java servlet, the servlet would correctly decode the parameter value. Usually the only time there are issues here is if the encoding doesn't match.
Strictly speaking, there is no requirement in the HTTP or URI specs that the query part to be encoded using application/x-www-form-urlencoded
key-value pairs; the query part just needs to be in the form the web server accepts. In practice, this is unlikely to be an issue.
It would generally be incorrect to use this encoding for other parts of the URI (the path for example). In that case, you should use the encoding scheme as described in RFC 3986.
http://host/Hello%20World
More here.
Just been struggling with this too on Android, managed to stumble upon Uri.encode(String, String) while specific to android (android.net.Uri) might be useful to some.
static String encode(String s, String allow)
https://developer.android.com/reference/android/net/Uri.html#encode(java.lang.String, java.lang.String)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With