I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function.
My torture test string is: "A" B ± "
If I enter the following JavaScript statement in Firebug:
encodeURIComponent('"A" B ± "');   —Then I get:
"%22A%22%20B%20%C2%B1%20%22"   Here's my little test Java program:
import java.io.UnsupportedEncodingException; import java.net.URLEncoder;  public class EncodingTest {   public static void main(String[] args) throws UnsupportedEncodingException   {     String s = "\"A\" B ± \"";     System.out.println("URLEncoder.encode returns "       + URLEncoder.encode(s, "UTF-8"));      System.out.println("getBytes returns "       + new String(s.getBytes("UTF-8"), "ISO-8859-1"));   } }   —This program outputs:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns "A" B ± "
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent?
EDIT: I'm using Java 1.4 moving to Java 5 shortly.
encodeURIComponent should be used to encode a URI Component - a string that is supposed to be part of a URL. encodeURI should be used to encode a URI or an existing URL.
The difference between encodeURI and encodeURIComponent is encodeURIComponent encodes the entire string, where encodeURI ignores protocol prefix ('http://') and domain name. encodeURIComponent is designed to encode everything, where encodeURI ignores a URL's domain related roots.
decodeURI(): It takes encodeURI(url) string as parameter and returns the decoded string. decodeURIComponent(): It takes encodeURIComponent(url) string as parameter and returns the decoded string.
Simply put, URL encoding translates special characters from the URL to a representation that adheres to the spec and can be correctly understood and interpreted.
This is the class I came up with in the end:
import java.io.UnsupportedEncodingException; import java.net.URLDecoder; import java.net.URLEncoder;  /**  * Utility class for JavaScript compatible UTF-8 encoding and decoding.  *   * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output  * @author John Topley   */ public class EncodingUtil {   /**    * Decodes the passed UTF-8 String using an algorithm that's compatible with    * JavaScript's <code>decodeURIComponent</code> function. Returns    * <code>null</code> if the String is <code>null</code>.    *    * @param s The UTF-8 encoded String to be decoded    * @return the decoded String    */   public static String decodeURIComponent(String s)   {     if (s == null)     {       return null;     }      String result = null;      try     {       result = URLDecoder.decode(s, "UTF-8");     }      // This exception should never occur.     catch (UnsupportedEncodingException e)     {       result = s;       }      return result;   }    /**    * Encodes the passed String as UTF-8 using an algorithm that's compatible    * with JavaScript's <code>encodeURIComponent</code> function. Returns    * <code>null</code> if the String is <code>null</code>.    *     * @param s The String to be encoded    * @return the encoded String    */   public static String encodeURIComponent(String s)   {     String result = null;      try     {       result = URLEncoder.encode(s, "UTF-8")                          .replaceAll("\\+", "%20")                          .replaceAll("\\%21", "!")                          .replaceAll("\\%27", "'")                          .replaceAll("\\%28", "(")                          .replaceAll("\\%29", ")")                          .replaceAll("\\%7E", "~");     }      // This exception should never occur.     catch (UnsupportedEncodingException e)     {       result = s;     }      return result;   }      /**    * Private constructor to prevent this class from being instantiated.    */   private EncodingUtil()   {     super();   } } 
                        Looking at the implementation differences, I see that:
MDC on encodeURIComponent():
[-a-zA-Z0-9._*~'()!] Java 1.5.0 documentation on URLEncoder:
[-a-zA-Z0-9._*] " " is converted into a plus sign "+". So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8") and then do some post-processing:
"+" with "%20" "%xx" representing any of [~'()!] back to their literal counter-partsIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With