Is there a common Java library that will handle URL encoding/decoding for a collection of strings?

Tags:

I often have to url encode or decode a large collection or array of strings. Besides iterating through them and using the static URLDecoder.decode(string, "UTF-8"), are there any libraries out there that will make this type of operation more performant?

A colleague insists that using the static method to decode the strings in-place is not thread safe. Why would that be?

928

asked May 01 '12 18:05

jrws

1 Answers

The JDK URLDecoder wasn't implemented efficiently. Most notably, internally it relies on StringBuffer (which unnecessarily introduces synchronization in the case of URLDecoder). The Apache commons provides URLCodec, but it has also been reported to have similar issues in regards to performance but I haven't verified that's still the case in most recent version.

Mark A. Ziesemer wrote a post a while back regarding the issues and performance with URLDecoder. He logged some bug reports and ended up writing a complete replacement. Because this is SO, I'll quote some key excerpts here, but you should really read the entire source article here: http://blogger.ziesemer.com/2009/05/improving-url-coder-performance-java.html

Selected quotes:

Java provides a default implementation of this functionality in java.net.URLEncoder and java.net.URLDecoder. Unfortunately, it is not the best performing, due to both how the API was written as well as details within the implementation. A number of performance-related bugs have been filed on sun.com in relation to URLEncoder.

There is an alternative: org.apache.commons.codec.net.URLCodec from Apache Commons Codec. (Commons Codec also provides a useful implementation for Base64 encoding.) Unfortunately, Commons' URLCodec suffers some of the same issues as Java's URLEncoder/URLDecoder.

...

Recommendations for both the JDK and Commons:

When constructing any of the "buffer" classes, e.g. ByteArrayOutputStream, CharArrayWriter, StringBuilder, or StringBuffer, estimate and pass-in an estimated capacity. The JDK's URLEncoder currently does this for its StringBuffer, but should do this for its CharArrayWriter instance as well. Common's URLCodec should do this for its ByteArrayOutputStream instance. If the classes' default buffer sizes are too small, they may have to resize by copying into new, larger buffers - which isn't exactly a "cheap" operation. If the classes' default buffer sizes are too large, memory may be unnecessarily wasted.

Both implementations are dependent on Charsets, but only accept them as their String name. Charset provides a simple and small cache for name lookups - storing only the last 2 Charsets used. This should not be relied upon, and both should accept Charset instances for other interoperability reasons as well.

Both implementations only handle fixed-size inputs and outputs. The JDK's URLEncoder only works with String instances. Commons' URLCodec is also based on Strings, but also works with byte[] arrays. This is a design-level constraint that essentially prevents efficient processing of larger or variable-length inputs. Instead, the "stream-supporting" interfaces such as CharSequence, Appendable, and java.nio's Buffer implementations of ByteBuffer and CharBuffer should be supported.

...

Note that com.ziesemer.utils.urlCodec is over 3x as fast as the JDK URLEncoder, and over 1.5x as fast as the JDK URLDecoder. (The JDK's URLDecoder was faster than the URLEncoder, so there wasn't as much room for improvement.)

I think your colleague is wrong to suggest URLDecode is not thread-safe. Other answers here explain in detail.

EDIT [2012-07-03] - Per later comment posted by OP

Not sure if you were looking for more ideas or not? You are correct that if you intend to operate on the list as an atomic collection, then you would have to synchronize all access to the list, including references outside of your method. However, if you are okay with the returned list contents potentially differing from the original list, then a brute force approach for operating on a "batch" of strings from a collection that might be modified by other threads could look something like this:

/**
 * @param origList will be copied by this method so that origList can continue
 *                 to be read/write by other threads. 
 * @return list containing  decoded strings for each entry that was 
           in origList at time of copy.
 */
public List<String> decodeListOfStringSafely(List<String> origList)
        throws UnsupportedEncodingException {
    List<String> snapshotList = new ArrayList<String>(origList);
    List<String> newList  = new ArrayList<String>(); 

    for (String urlStr : snapshotList) {
      String decodedUrlStr  = URLDecoder.decode(urlStr, "UTF8");
          newList.add(decodedUrlStr);
    }

    return newList;
}

If that does not help, then I'm still not sure what you are after and you would be better served to create a new, more concise, question. If that is what you were asking about, then be careful because this example out of context is not a good idea for many reasons.

162

answered Nov 05 '22 10:11

kaliatech

Related questions
                            
                                Java: simplest integer hash
                            
                                How can one break this (non?) thread safe object?
                            
                                What's the difference between "toggle line breakpoint" and "toggle breakpoint" in Eclipse?
                            
                                Parameterized SPARQL query with JENA
                            
                                Faster way to check intersected rectangles?
                            
                                Refactoring exercises in Java
                            
                                Override System Property in Java (without code)
                            
                                When adding a second item to my stackpane, the first item loses its Event/MouseOn. Why? How can I fix? JavaFX
                            
                                Is there a way to silence hsqldb logging?
                            
                                Configure custom converters using Dozer Java API
                            
                                JPA - analog of LobCreator from hibernate?
                            
                                Can Spring's @Cacheable annotation have the same scope as the annotated method's bean?
                            
                                JAVA: gmt to local time conversion
                            
                                Raw image in RESTeasy
                            
                                JFreechart ChartPanel not getting Transparenent
                            
                                Spring 3.1.1 with hibernate 4.1 annotations configuration
                            
                                How to redirect Spring security concurrent session control 'message' on login screen?
                            
                                @throws in Scala does not allow calling Java to catch correct exception type
                            
                                How should I represent a chess bitboard in clojure?
                            
                                naming convention for domain services and application services with DDD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a common Java library that will handle URL encoding/decoding for a collection of strings?

Tags:

java

urlencode

urldecode

jrws

People also ask

1 Answers

kaliatech

Recent Activity

Donate For Us