I have an input String say Please go to http://stackoverflow.com
. The url part of the String is detected and an anchor <a href=""></a>
is automatically added by many browser/IDE/applications. So it becomes Please go to <a href='http://stackoverflow.com'>http://stackoverflow.com</a>
.
I need to do the same using Java.
Use a Regex to Find URLs in a String In the function, we refine the urlRegex variable that has the regex for matching URLs. We check for http or https . And we look for slashes and text after that. The g flag at the end of the regex lets us search for all URLs in the string.
To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.
Hey, why don't use the core class in java for this "java.net.URL" and let it validate the URL.
While the following code violates the golden principle "Use exception for exceptional conditions only" it does not make sense to me to try to reinvent the wheel for something that is veeery mature on the java platform.
Here's the code:
import java.net.URL; import java.net.MalformedURLException; // Replaces URLs with html hrefs codes public class URLInString { public static void main(String[] args) { String s = args[0]; // separate input by spaces ( URLs don't have spaces ) String [] parts = s.split("\\s+"); // Attempt to convert each item into an URL. for( String item : parts ) try { URL url = new URL(item); // If possible then replace with anchor... System.out.print("<a href=\"" + url + "\">"+ url + "</a> " ); } catch (MalformedURLException e) { // If there was an URL that was not it!... System.out.print( item + " " ); } System.out.println(); } }
Using the following input:
"Please go to http://stackoverflow.com and then mailto:[email protected] to download a file from ftp://user:pass@someserver/someFile.txt"
Produces the following output:
Please go to <a href="http://stackoverflow.com">http://stackoverflow.com</a> and then <a href="mailto:[email protected]">mailto:[email protected]</a> to download a file from <a href="ftp://user:pass@someserver/someFile.txt">ftp://user:pass@someserver/someFile.txt</a>
Of course different protocols could be handled in different ways. You can get all the info with the getters of URL class, for instance
url.getProtocol();
Or the rest of the attributes: spec, port, file, query, ref etc. etc
http://java.sun.com/javase/6/docs/api/java/net/URL.html
Handles all the protocols ( at least all of those the java platform is aware ) and as an extra benefit, if there is any URL that java currently does not recognize and eventually gets incorporated into the URL class ( by library updating ) you'll get it transparently!
While it's not Java specific, Jeff Atwood recently posted an article about the pitfalls you might run into when trying to locate and match URLs in arbitrary text:
The Problem With URLs
It gives a good regex that can be used along with the snippet of code that you need to use to properly (more or less) handle parens.
The regex:
\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]
The paren cleanup:
if (s.StartsWith("(") && s.EndsWith(")")) { return s.Substring(1, s.Length - 2); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With