Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala or Java Library for fixing malformed URIs

Does anyone know of a good Scala or Java library that can fix common problems in malformed URIs, such as containing characters that should be escaped but aren't?

like image 500
Erik Engbrecht Avatar asked Oct 02 '11 20:10

Erik Engbrecht


2 Answers

I've tested a few libraries, including the now legacy URIUtil of HTTPClient without feeling I found any viable solution. Typically, I've had enough success with this type of java.net.URI construct though:

/**
 * Tries to construct an url by breaking it up into its smallest elements
 * and encode each component individually using the full URI constructor:
 *
 *    foo://example.com:8042/over/there?name=ferret#nose
 *    \_/   \______________/\_________/ \_________/ \__/
 *     |           |            |            |        |
 *  scheme     authority       path        query   fragment
 */
public URI parseUrl(String s) throws Exception {
   URL u = new URL(s);
   return new URI(
        u.getProtocol(), 
        u.getAuthority(), 
        u.getPath(),
        u.getQuery(), 
        u.getRef());
}

which may be used combination with the following routine. It repeatedly decodes an URL until the decoded string doesn't change, which can be useful against e.g., double encoding. Note, to keep it simple, this sample doesn't feature any failsafe etc.

public String urlDecode(String url, String encoding) throws UnsupportedEncodingException, IllegalArgumentException {
    String result = URLDecoder.decode(url, encoding);
    return result.equals(url) ? result : urlDecode(result, encoding);
}
like image 89
Johan Sjöberg Avatar answered Nov 19 '22 12:11

Johan Sjöberg


I would advise against using java.net.URLEncoder for percent encoding URIs. Despite the name, it is not great for encoding URLs as it does not follow the rfc3986 standard and instead encodes to the application/x-www-form-urlencoded MIME format (read more here)

For encoding URIs in Scala I would recommend the Uri class from spray-http. scala-uri is an alternative (disclaimer: I'm the author).

like image 35
theon Avatar answered Nov 19 '22 10:11

theon