Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Auto-link regular expression

Tags:

regex

url

php

I am using a PHP function to automatically turn URLs in a text string into an actual link that people can click on. It seems to work in most cases, however I have found some cases where it does not.

I don't really understand regular expressions at all, so I was hoping someone could help me out with this.

Here is the pattern I'm currently using:

$pattern = "/(((http[s]?:\/\/)|(www\.))(([a-z][-a-z0-9]+\.)?[a-z][-a-z0-9]+\.[a-z]+(\.[a-z]{2,2})?)\/?[a-z0-9.,_\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1})/is";

However here are some links I have found that this pattern is not matching:

  • www.oakvilletransit.ca - Not sure, but assuming it doesn't match because of the two-letter country code
  • www.grt.ca - Another one with the .ca domain that is not working
  • Several other .ca addresses
  • freepublictransports.com - Addresses without www. or http:// in front of them. I would like these to work as well.
  • www.222tips.com - Assuming it doesn't match because of the numbers at the beginning of the address.

Does anyone know how I can modify that regex pattern to match these cases as well?

EDIT - It should also match URLs that may have a period at the end. If a URL is the last part of a sentence there may be a period at the end that should not be included in the actual link. Currently this pattern takes that into account as well.

EDIT 2 - I am using the pattern like this:

$pattern = "/((http|https):\/\/)?([a-z0-9-]+\.)?[a-z][a-z0-9-]+(\.[a-z]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/is";
  $string = preg_replace($pattern, " <a target='_blank' href='$1'>$1</a>", $string);
  // fix URLs without protocols
  $string = preg_replace("/href='www/", "href='http://www", $string);
  return $string;
like image 215
Sherwin Flight Avatar asked Jun 03 '12 23:06

Sherwin Flight


2 Answers

The following regex will match URLs:

  • (Optionally) With http:// or https://
  • (Optionally) With a subdomain (www.example.com, help.example.com, etc)
  • With 1-3 domain extensions, which each must be 2-6 characters (www.example.com.gu, www.example.com.au.museum, etc)
  • (Optionally) With a forward slash at the end
  • (Optionally) With valid characters after the forward slash

The /i at the end makes it case insensitive.

/((http|https):\/\/)?([a-z0-9-]+\.)?[a-z0-9-]+(\.[a-z]{2,6}){1,3}(\/[a-z0-9.,_\/~#&=;%+?-]*)?/is

Edit: This will not match any "hanging" periods at the end (such as the end of a sentence) because it's not part of the URL, and shouldn't be included in the href attribute of your link.

Edit 2: In your first preg_replace(), change $1 to $0. This will insert the entire matched string instead of a single part of it.

Edit 3: (Update 2) Here's a better way you can check for a http:// or https:// at the beginning:

preg_replace("/href='[^h][^t][^t][^p][^s]?[^:]/", "/href='http:\/\/", $string);
like image 110
Litty Avatar answered Oct 23 '22 09:10

Litty


I had problems with all the examples above.

Here is one that works:

function autolink($string){
        $string= preg_replace("#http://([\S]+?)#Uis", '<a href="http://\\1">\\1</a>', $string);
        return $string;
}
like image 32
boksiora Avatar answered Oct 23 '22 09:10

boksiora