Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Only match URL beginning with 'www' or 'http(s)://' and nothing else

Tags:

regex

I am using a Regular Expression pattern for my blog site to make URL addresses as clickable links, what works great. The pattern has this format:

/(href=")?([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/\/=]+)?)/

So what's the problem?

But in the near past I found that this pattern also matches filenames so when the user post some filename in the comment, system will make it as link. You can see this effect here:

enter image description here

And what I am trying to achieve?

What I am trying to achieve is match every of these URL formats except the last one example (see image below), so mysite.com or filename.php won't be highlighted.

enter image description here


Inputs what should be matched:

+--------------------------+------------------------------------------------------+
|         Example          |                     Explanation                      |
+--------------------------+------------------------------------------------------+
| http(s)://www.mysite.com | because it starts with http(s):// and has URL format |
| www.mysite.com           | because it starts with www. and has URL format       |
+--------------------------+------------------------------------------------------+

Inputs what shouldn't be matched:

+-------------------+--------------------------------------------------+
|      Example      |                    Explanation                   |
+-------------------+--------------------------------------------------+
| mysite.com        | because it doesn't start with http(s):// or www. |
|                   | even it has URL format                           |
| http(s)://mytext  | because it doesn't have URL format               |
| http://localhost/ | because it doesn't have URL format               |
+-------------------+--------------------------------------------------+

How URL format looks like?

For this case, we can specify URL format by this pattern:

([-a-zA-Z0-9_.]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9:%_\+.~#?&\/=]+)?))

Examples:

google.com, google.co.uk, accounts.google.com, google.com/somepath/ ...

Conclusion

A tried adding www\. string into this pattern, but no matches found then. So how can I edit this regex to match URLs beginning with 'www' or 'http(s)://' and nothing else?

Thanks in advance.

like image 304
Lkopo Avatar asked Nov 01 '22 17:11

Lkopo


1 Answers

This regexp is definitelly not perfect but will do what you want:

(http[s]?:\/\/|www.|ftp:\/\/){1,2}([-a-zA-Z0-9_]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/=]+)?)

It can be tricked to match non-urls, but this can't be abused. Increasing smartness greatly increases complexity.

like image 152
Tomáš Zato - Reinstate Monica Avatar answered Nov 15 '22 11:11

Tomáš Zato - Reinstate Monica