Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match URLs in Java

I use RegexBuddy while working with regular expressions. From its library I copied the regular expression to match URLs. I tested successfully within RegexBuddy. However, when I copied it as Java String flavor and pasted it into Java code, it does not work. The following class prints false:

public class RegexFoo {      public static void main(String[] args) {         String regex = "\\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]";         String text = "http://google.com";         System.out.println(IsMatch(text,regex)); }      private static boolean IsMatch(String s, String pattern) {         try {             Pattern patt = Pattern.compile(pattern);             Matcher matcher = patt.matcher(s);             return matcher.matches();         } catch (RuntimeException e) {         return false;     }        }    } 

Does anyone know what I am doing wrong?

like image 953
Sergio del Amo Avatar asked Oct 02 '08 16:10

Sergio del Amo


People also ask

How do you check the URL is valid or not?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

What is a regular expression URL?

URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.

What does \\ mean in Java regex?

String regex = "\\."; Notice that the regular expression String contains two backslashes after each other, and then a . . The reason is, that first the Java compiler interprets the two \\ characters as an escaped Java String character. After the Java compiler is done, only one \ is left, as \\ means the character \ .

How do you match in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


1 Answers

Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.

String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; 

This works too:

String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; 

Note:

String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>  String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com> 
like image 167
TomC Avatar answered Sep 29 '22 18:09

TomC