I would like to ask if there's any Java package or library that have the standard URL normalization?
5 Components of URL Representation
http://www[dot]example[dot]com:8040/folder/exist?name=sky#head
The 3 types of standard URL normalization
Syntax-Based Normalization
Scheme-Based Normalization
Protocol-Based Normalization
As others have mentioned, java.net.URL and/or java.net.URI are some obvious starting points.
Here some other options:
Galimatias (Spanish for "gibberish") appears to be an opinionated and relatively popular URL normalization library for Java. The source code can be found at github.com/smola/galimatias.
galimatias started out of frustration with java.net.URL and java.net.URI. Both of them are good for basic use cases, but severely broken for others
The github.com/sentric/url-normalization library provides another (unusual, in my opinion) approach where it reverses the domain portion; e.g. "com.stackoverflow" instead of "stackoverflow.com".
You can find other variations, sometimes implemented in languages such as Python, Ruby, and PHP on Github.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With