Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression pattern to match URL with or without http://www

Tags:

regex

php

I'm not very good at regular expressions at all.

I've been using a lot of framework code to date, but I'm unable to find one that is able to match a URL like http://www.example.com/etcetc, but it is also is able to catch something like www.example.com/etcetc and example.com/etcetc.

like image 513
Edmund Rojas Avatar asked Jun 21 '11 15:06

Edmund Rojas


People also ask

How do you match a URL in regex?

@:%_\+~#= , to match the domain/sub domain name. In this solution query string parameters are also taken care. If you are not using RegEx , then from the expression replace \\ by \ . Hope this helps.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What does '$' mean in regex?

Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .


2 Answers

For matching all kinds of URLs, the following code should work:

<?php     $regex = "((https?|ftp)://)?"; // SCHEME     $regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass     $regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP     $regex .= "(:[0-9]{2,5})?"; // Port     $regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path     $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query     $regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor ?> 

Then, the correct way to check against the regex is as follows:

<?php    if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))       var_dump($m);     if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))       var_dump($m); ?> 

Courtesy: Comments made by splattermania in the PHP manual: http://php.net/manual/en/function.preg-match.php

RegEx Demo in regex101

like image 61
anubhava Avatar answered Sep 30 '22 07:09

anubhava


This worked for me in all cases I had tested:

$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:@\-_=#])*/'; 

Tests:

http://test.test-75.1474.stackoverflow.com/ https://www.stackoverflow.com https://www.stackoverflow.com/ http://wwww.stackoverflow.com/ http://wwww.stackoverflow.com   http://test.test-75.1474.stackoverflow.com/ http://www.stackoverflow.com http://www.stackoverflow.com/ stackoverflow.com/ stackoverflow.com  http://www.example.com/etcetc www.example.com/etcetc example.com/etcetc user:[email protected]/etcetc  example.com/etcetc?query=aasd example.com/etcetc?query=aasd&dest=asds  http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/ 

Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.

like image 39
H Aßdøµ Avatar answered Sep 30 '22 07:09

H Aßdøµ