I am using a Regular Expression pattern for my blog site to make URL addresses as clickable links, what works great. The pattern has this format:
/(href=")?([-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/\/=]+)?)/
But in the near past I found that this pattern also matches filenames so when the user post some filename in the comment, system will make it as link. You can see this effect here:
What I am trying to achieve is match every of these URL formats except the last one example (see image below), so mysite.com
or filename.php
won't be highlighted.
Inputs what should be matched:
+--------------------------+------------------------------------------------------+
| Example | Explanation |
+--------------------------+------------------------------------------------------+
| http(s)://www.mysite.com | because it starts with http(s):// and has URL format |
| www.mysite.com | because it starts with www. and has URL format |
+--------------------------+------------------------------------------------------+
Inputs what shouldn't be matched:
+-------------------+--------------------------------------------------+
| Example | Explanation |
+-------------------+--------------------------------------------------+
| mysite.com | because it doesn't start with http(s):// or www. |
| | even it has URL format |
| http(s)://mytext | because it doesn't have URL format |
| http://localhost/ | because it doesn't have URL format |
+-------------------+--------------------------------------------------+
How URL format looks like?
For this case, we can specify URL format by this pattern:
([-a-zA-Z0-9_.]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9:%_\+.~#?&\/=]+)?))
Examples:
google.com, google.co.uk, accounts.google.com, google.com/somepath/ ...
A tried adding www\.
string into this pattern, but no matches found then. So how can I edit this regex to match URLs beginning with 'www' or 'http(s)://' and nothing else?
Thanks in advance.
This regexp is definitelly not perfect but will do what you want:
(http[s]?:\/\/|www.|ftp:\/\/){1,2}([-a-zA-Z0-9_]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/=]+)?)
It can be tricked to match non-urls, but this can't be abused. Increasing smartness greatly increases complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With