Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a function to convert plain text into clickable links where text contains 5 types of URLs

Tags:

regex

php

Here are the types of links that are inside of the text. The links may start with a white space or may also be a part of a longer string for example: sometexthttp://www.domain.extension?parameters

1. http://domain.extension?parameter  
2. http://subdomain.domain.extension?parameters
3. https://domain.extension?parameter
4. https://subdomain.domain.extension?parameters
5. www.domain.extension?parameter  

I wrote the following function which partially works. The first regex finds all the strings containing "www." and adds to them prefix "http://". And the second regex wraps them into "a" tags.

function MakeClickableLinks($text) {                         
$text = preg_replace('(((www).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))',   ' http://$1',   $text);                          
$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?)@',     '<a href="$1" rel="nofollow" target="_blank">$1</a>',       $text);       

return $text;               
}

This is the test string: $text = 'some-texthttps://www.sdfsd.com some-texthttp://www.sdfsd.com http://www.sdfsd.com https://www.ertert.com sometextwww.ssssss.com www.hhhh.com www.hhhh.comsdfsdfs';

This is current output: some-texthttps:// http://www.sdfsd.com some-texthttp:// http://www.sdfsd.com http:// http://www.sdfsd.com https://
http://www.ertert.com sometext http://www.ssssss.com http://www.hhhh.com http://www.hhhh.comsdfsdfs

The problem is that the first regex also adds extra "http://" inside of a proper URLs that already start with http:// or https://

"http://www.domain.extension" 
gets converted into this:
"http:// http://www.domain.extension"
like image 770
Jimski Avatar asked Nov 26 '25 22:11

Jimski


1 Answers

Using "negative look behind" qualifier to make sure that the "www" is not preceded with a forward slash "/" solves the problem by excluding all http:// and https:// from having undesired insertions :).

Here is the modified first regex from the original question.

((?<![/])((www).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))


Here is complete working function using first regex with negative look behind together with second regex suggested by Simo.

function MakeClickableLinks($text) {       
$text = preg_replace('@((?<![/])((www\.).([-\w\.]+)+(:\d+)?(/([\w/_\.%-=#]*(\?\S+)?)?)?))@',  ' http://$1',  $text);              
$text = preg_replace("/((https?:\/\/)[^\s]+)/",   '<a href="$1" rel="nofollow" target="_blank" >$1</a>',  $text);         
return $text;               
}

This has been tested with php7 and catches majority of URLs within plain text. Additional improvements could include limiting URL length.

Also it would be a good idea to run the resulting HTML through some XSS cleaning library to remove any potential XSS from the URLs.

like image 63
Jimski Avatar answered Nov 29 '25 11:11

Jimski



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!