Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect the presence of URL in a string

Tags:

java

url

I have an input String say Please go to http://stackoverflow.com. The url part of the String is detected and an anchor <a href=""></a> is automatically added by many browser/IDE/applications. So it becomes Please go to <a href='http://stackoverflow.com'>http://stackoverflow.com</a>.

I need to do the same using Java.

like image 625
Rakesh N Avatar asked Nov 12 '08 22:11

Rakesh N


People also ask

How do you find a link in a string?

Use a Regex to Find URLs in a String In the function, we refine the urlRegex variable that has the regex for matching URLs. We check for http or https . And we look for slashes and text after that. The g flag at the end of the regex lets us search for all URLs in the string.

How do I find the URL of a text in Python?

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.


2 Answers

Use java.net.URL for that!!

Hey, why don't use the core class in java for this "java.net.URL" and let it validate the URL.

While the following code violates the golden principle "Use exception for exceptional conditions only" it does not make sense to me to try to reinvent the wheel for something that is veeery mature on the java platform.

Here's the code:

import java.net.URL; import java.net.MalformedURLException;  // Replaces URLs with html hrefs codes public class URLInString {     public static void main(String[] args) {         String s = args[0];         // separate input by spaces ( URLs don't have spaces )         String [] parts = s.split("\\s+");          // Attempt to convert each item into an URL.            for( String item : parts ) try {             URL url = new URL(item);             // If possible then replace with anchor...             System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );             } catch (MalformedURLException e) {             // If there was an URL that was not it!...             System.out.print( item + " " );         }          System.out.println();     } } 

Using the following input:

"Please go to http://stackoverflow.com and then mailto:[email protected] to download a file from    ftp://user:pass@someserver/someFile.txt" 

Produces the following output:

Please go to <a href="http://stackoverflow.com">http://stackoverflow.com</a> and then <a href="mailto:[email protected]">mailto:[email protected]</a> to download a file from    <a href="ftp://user:pass@someserver/someFile.txt">ftp://user:pass@someserver/someFile.txt</a> 

Of course different protocols could be handled in different ways. You can get all the info with the getters of URL class, for instance

 url.getProtocol(); 

Or the rest of the attributes: spec, port, file, query, ref etc. etc

http://java.sun.com/javase/6/docs/api/java/net/URL.html

Handles all the protocols ( at least all of those the java platform is aware ) and as an extra benefit, if there is any URL that java currently does not recognize and eventually gets incorporated into the URL class ( by library updating ) you'll get it transparently!

like image 139
OscarRyz Avatar answered Sep 26 '22 06:09

OscarRyz


While it's not Java specific, Jeff Atwood recently posted an article about the pitfalls you might run into when trying to locate and match URLs in arbitrary text:

The Problem With URLs

It gives a good regex that can be used along with the snippet of code that you need to use to properly (more or less) handle parens.

The regex:

\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|] 

The paren cleanup:

if (s.StartsWith("(") && s.EndsWith(")")) {     return s.Substring(1, s.Length - 2); } 
like image 36
Michael Burr Avatar answered Sep 26 '22 06:09

Michael Burr