Here are the types of links that are inside of the text. The links may start with a white space or may also be a part of a longer string for example: sometexthttp://www.domain.extension?parameters
1. http://domain.extension?parameter
2. http://subdomain.domain.extension?parameters
3. https://domain.extension?parameter
4. https://subdomain.domain.extension?parameters
5. www.domain.extension?parameter
I wrote the following function which partially works. The first regex finds all the strings containing "www." and adds to them prefix "http://". And the second regex wraps them into "a" tags.
function MakeClickableLinks($text) {
$text = preg_replace('(((www).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))', ' http://$1', $text);
$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?)@', '<a href="$1" rel="nofollow" target="_blank">$1</a>', $text);
return $text;
}
This is the test string: $text = 'some-texthttps://www.sdfsd.com some-texthttp://www.sdfsd.com http://www.sdfsd.com https://www.ertert.com sometextwww.ssssss.com www.hhhh.com www.hhhh.comsdfsdfs';
This is current output:
some-texthttps:// http://www.sdfsd.com some-texthttp://
http://www.sdfsd.com http:// http://www.sdfsd.com https://
http://www.ertert.com sometext http://www.ssssss.com http://www.hhhh.com
http://www.hhhh.comsdfsdfs
The problem is that the first regex also adds extra "http://" inside of a proper URLs that already start with http:// or https://
"http://www.domain.extension"
gets converted into this:
"http:// http://www.domain.extension"
Using "negative look behind" qualifier to make sure that the "www" is not preceded with a forward slash "/" solves the problem by excluding all http:// and https:// from having undesired insertions :).
Here is the modified first regex from the original question.
((?<![/])((www).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))
Here is complete working function using first regex with negative look behind together with second regex suggested by Simo.
function MakeClickableLinks($text) {
$text = preg_replace('@((?<![/])((www\.).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))@', ' http://$1', $text);
$text = preg_replace("/((https?:\/\/)[^\s]+)/", '<a href="$1" rel="nofollow" target="_blank" >$1</a>', $text);
return $text;
}
This has been tested with php7 and catches majority of URLs within plain text. Additional improvements could include limiting URL length.
Also it would be a good idea to run the resulting HTML through some XSS cleaning library to remove any potential XSS from the URLs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With