Here is a design though: For example is I put a link such as <blockquote> http://example.com </blockquote> in textarea. How do I get PHP to detect it’s a <code>http://</code> link and then print it as <pre class="prettyprint"><code>print "<a href='http://www.example.com'>http://www.example.com</a>"; </code></pre> I remember doing something like this before however, it was not fool proof it kept breaking for complex links. Another good idea would be if you have a link such as <blockquote> http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl </blockquote> fix it so it does <pre class="prettyprint"><code>print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>"; print "http://example.com/test.php"; print "</a>"; </code></pre> This one is just an after thought.. stackoverflow could also probably use this as well :D Any Ideas

Let's look at the requirements. You have some user-supplied plain text, which you want to display with hyperlinked URLs. <ol> <li>The "http://" protocol prefix should be optional.</li> <li>Both domains and IP addresses should be accepted.</li> <li>Any valid top-level domain should be accepted, e.g. .aero and .xn--jxalpdlp.</li> <li>Port numbers should be allowed.</li> <li>URLs must be allowed in normal sentence contexts. For instance, in "Visit stackoverflow.com.", the final period is not part of the URL.</li> <li>You probably want to allow "https://" URLs as well, and perhaps others as well.</li> <li>As always when displaying user supplied text in HTML, you want to prevent cross-site scripting (XSS). Also, you'll want ampersands in URLs to be correctly escaped as &amp;.</li> <li>You probably don't need support for IPv6 addresses.</li> <li> Edit: As noted in the comments, support for email-adresses is definitely a plus.</li> <li> Edit: Only plain text input is to be supported – HTML tags in the input should not be honoured. (The Bitbucket version supports HTML input.)</li> </ol> Edit: Check out GitHub for the latest version, with support for email addresses, authenticated URLs, URLs in quotes and parentheses, HTML input, as well as an updated TLD list. Here's my take: <pre class="prettyprint"><code><?php $text = <<<EOD Here are some URLs: stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question? A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful. There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm? Ports: 192.168.0.1:8080, https://example.net:1234/. Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp. And remember.Nobody is perfect. <script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script> EOD; $rexProtocol = '(https?://)?'; $rexDomain = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})'; $rexPort = '(:[0-9]{1,5})?'; $rexPath = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?'; $rexQuery = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?'; $rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?'; // Solution 1: function callback($match) { // Prepend http:// if no protocol specified $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}"; return '<a href="' . $completeUrl . '">' . $match[2] . $match[3] . $match[4] . '</a>'; } print "<pre>"; print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&", 'callback', htmlspecialchars($text)); print "</pre>"; </code></pre> <ul> <li>To properly escape < and & characters, I throw the whole text through htmlspecialchars before processing. This is not ideal, as the html escaping can cause misdetection of URL boundaries.</li> <li>As demonstrated by the "And remember.Nobody is perfect." line (in which remember.Nobody is treated as an URL, because of the missing space), further checking on valid top-level domains might be in order.</li> </ul> Edit: The following code fixes the above two problems, but is quite a bit more verbose since I'm more or less re-implementing <code>preg_replace_callback</code> using <code>preg_match</code>. <pre class="prettyprint"><code>// Solution 2: $validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true); $position = 0; while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position)) { list($url, $urlPosition) = $match[0]; // Print the text leading up to the URL. print(htmlspecialchars(substr($text, $position, $urlPosition - $position))); $domain = $match[2][0]; $port = $match[3][0]; $path = $match[4][0]; // Check if the TLD is valid - or that $domain is an IP address. $tld = strtolower(strrchr($domain, '.')); if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld])) { // Prepend http:// if no protocol specified $completeUrl = $match[1][0] ? $url : "http://$url"; // Print the hyperlink. printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path")); } else { // Not a valid URL. print(htmlspecialchars($url)); } // Continue text parsing from after the URL. $position = $urlPosition + strlen($url); } // Print the remainder of the text. print(htmlspecialchars(substr($text, $position))); </code></pre>

Here is something i found that is tried and tested <pre class="prettyprint"><code>function make_links_blank($text) { return preg_replace( array( '/(?(?=<a[^>]*>.+<\/a>) (?:<a[^>]*>.+<\/a>) | ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+) )/iex', '/<a([^>]*)target="?[^"\']+"?/i', '/<a([^>]+)>/i', '/(^|\s)(www.[^<> \n\r]+)/iex', '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+) (\\.[A-Za-z0-9-]+)*)/iex' ), array( "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))", '<a\\1', '<a\\1 target="_blank">', "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))", "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))" ), $text ); } </code></pre> It works for me. And it works for emails and URL's, Sorry to answer my own question. :( But this one is the only that works Here is the link where i found it : http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html Sry in advance for it being a experts-exchange.

You guyz are talking way to advance and complex stuff which is good for some situation, but mostly we need a simple careless solution. How about simply this? <pre class="prettyprint"><code>preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg); </code></pre> Just try it and let me know what crazy url it doesnt satisfy.

Here is the code using Regular Expressions in function <pre class="prettyprint"><code><?php //Function definations function MakeUrls($str) { $find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si'); $replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>'); return preg_replace($find,$replace,$str); } //Function testing $str="www.cloudlibz.com"; $str=MakeUrls($str); echo $str; ?> </code></pre>

this should get you email addresses: <pre class="prettyprint"><code>$string = "bah bah steve@gmail.com foo"; $match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array); print_r($array); // outputs: Array ( [0] => steve@gmail.com ) </code></pre>

I know this answer has been accepted and that this question is quite old, but it can be useful for other people looking for other implementations. This is a modified version of the code posted by: Angel.King.47 on July 27,09: <pre class="prettyprint"><code>$text = preg_replace( array( '/(^|\s|>)(www.[^<> \n\r]+)/iex', '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex', '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex' ), array( "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))", "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))", ), $text ); </code></pre> Changes: <ul> <li>I removed rules #2 and #3 (I'm not sure in which situations are useful). </li> <li>Removed email parsing as I really don't need it.</li> <li>I added one more rule which allows the recognition of URLs in the form: [domain]/* (without www). For example: "example.com/faq/" (Multiple tld: domain.{2-3}.{2-4}/)</li> <li>When parsing strings starting with "http://", it removes it from the link label.</li> <li>Added "target='_blank'" to all links.</li> <li>Urls can be specified just after any(?) tag. For example: www.example.com</li> </ul> As "Søren Løvborg" has stated, this function does not escape the URLs. I tried his/her class but it just didn't work as I expected (If you don't trust your users, then try his/her code first).

Replace URLs in text with HTML links

Tags:

regex

url

php

preg-replace

linkify

Here is a design though: For example is I put a link such as

http://example.com

in textarea. How do I get PHP to detect it’s a http:// link and then print it as

print "<a href='http://www.example.com'>http://www.example.com</a>";

I remember doing something like this before however, it was not fool proof it kept breaking for complex links.

Another good idea would be if you have a link such as

http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl

fix it so it does

print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>";
print "http://example.com/test.php";
print "</a>";

This one is just an after thought.. stackoverflow could also probably use this as well :D

Any Ideas

398

asked Jul 27 '09 13:07

Angel.King.47

9 Answers

Let's look at the requirements. You have some user-supplied plain text, which you want to display with hyperlinked URLs.

The "http://" protocol prefix should be optional.
Both domains and IP addresses should be accepted.
Any valid top-level domain should be accepted, e.g. .aero and .xn--jxalpdlp.
Port numbers should be allowed.
URLs must be allowed in normal sentence contexts. For instance, in "Visit stackoverflow.com.", the final period is not part of the URL.
You probably want to allow "https://" URLs as well, and perhaps others as well.
As always when displaying user supplied text in HTML, you want to prevent cross-site scripting (XSS). Also, you'll want ampersands in URLs to be correctly escaped as &.
You probably don't need support for IPv6 addresses.
Edit: As noted in the comments, support for email-adresses is definitely a plus.
Edit: Only plain text input is to be supported – HTML tags in the input should not be honoured. (The Bitbucket version supports HTML input.)

Edit: Check out GitHub for the latest version, with support for email addresses, authenticated URLs, URLs in quotes and parentheses, HTML input, as well as an updated TLD list.

Here's my take:

<?php
$text = <<<EOD
Here are some URLs:
stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
Ports: 192.168.0.1:8080, https://example.net:1234/.
Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
And remember.Nobody is perfect.

<script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>
EOD;

$rexProtocol = '(https?://)?';
$rexDomain   = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})';
$rexPort     = '(:[0-9]{1,5})?';
$rexPath     = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?';
$rexQuery    = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';
$rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';

// Solution 1:

function callback($match)
{
    // Prepend http:// if no protocol specified
    $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}";

    return '<a href="' . $completeUrl . '">'
        . $match[2] . $match[3] . $match[4] . '</a>';
}

print "<pre>";
print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&",
    'callback', htmlspecialchars($text));
print "</pre>";

To properly escape < and & characters, I throw the whole text through htmlspecialchars before processing. This is not ideal, as the html escaping can cause misdetection of URL boundaries.
As demonstrated by the "And remember.Nobody is perfect." line (in which remember.Nobody is treated as an URL, because of the missing space), further checking on valid top-level domains might be in order.

Edit: The following code fixes the above two problems, but is quite a bit more verbose since I'm more or less re-implementing preg_replace_callback using preg_match.

// Solution 2:

$validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true);

$position = 0;
while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position))
{
    list($url, $urlPosition) = $match[0];

    // Print the text leading up to the URL.
    print(htmlspecialchars(substr($text, $position, $urlPosition - $position)));

    $domain = $match[2][0];
    $port   = $match[3][0];
    $path   = $match[4][0];

    // Check if the TLD is valid - or that $domain is an IP address.
    $tld = strtolower(strrchr($domain, '.'));
    if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld]))
    {
        // Prepend http:// if no protocol specified
        $completeUrl = $match[1][0] ? $url : "http://$url";

        // Print the hyperlink.
        printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path"));
    }
    else
    {
        // Not a valid URL.
        print(htmlspecialchars($url));
    }

    // Continue text parsing from after the URL.
    $position = $urlPosition + strlen($url);
}

// Print the remainder of the text.
print(htmlspecialchars(substr($text, $position)));

answered Oct 04 '22 01:10

Søren Løvborg

Here is something i found that is tried and tested

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}

It works for me. And it works for emails and URL's, Sorry to answer my own question. :(

But this one is the only that works

Here is the link where i found it : http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html

Sry in advance for it being a experts-exchange.

answered Oct 04 '22 01:10

Angel.King.47

You guyz are talking way to advance and complex stuff which is good for some situation, but mostly we need a simple careless solution. How about simply this?

preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg);

Just try it and let me know what crazy url it doesnt satisfy.

answered Oct 01 '22 01:10

Raheel Hasan

I've been using this function, it works for me

function AutoLinkUrls($str,$popup = FALSE){
    if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){
        $pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
        for ($i = 0; $i < count($matches['0']); $i++){
            $period = '';
            if (preg_match("|\.$|", $matches['6'][$i])){
                $period = '.';
                $matches['6'][$i] = substr($matches['6'][$i], 0, -1);
            }
            $str = str_replace($matches['0'][$i],
                    $matches['1'][$i].'<a href="http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'"'.$pop.'>http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'</a>'.
                    $period, $str);
        }//end for
    }//end if
    return $str;
}//end AutoLinkUrls

All credits goes to - http://snipplr.com/view/68586/

Enjoy!

answered Oct 02 '22 01:10

Armand

Here is the code using Regular Expressions in function

<?php
//Function definations
function MakeUrls($str)
{
$find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

$replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>');

return preg_replace($find,$replace,$str);
}
//Function testing
$str="www.cloudlibz.com";
$str=MakeUrls($str);
echo $str;
?>

answered Oct 04 '22 01:10

Dharmendra Jadon

This RegEx should match any link except for these new 3+ character toplevel domains...

{
  \\b
  # Match the leading part (proto://hostname, or just hostname)
  (
    # http://, or https:// leading part
    (https?)://[-\\w]+(\\.\\w[-\\w]*)+
  |
    # or, try to find a hostname with more specific sub-expression
    (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \\. )+ # sub domains
    # Now ending .com, etc. For these, require lowercase
    (?-i: com\\b
        | edu\\b
        | biz\\b
        | gov\\b
        | in(?:t|fo)\\b # .int or .info
        | mil\\b
        | net\\b
        | org\\b
        | [a-z][a-z]\\.[a-z][a-z]\\b # two-letter country code
    )
  )

  # Allow an optional port number
  ( : \\d+ )?

  # The rest of the URL is optional, and begins with /
  (
    /
    # The rest are heuristics for what seems to work well
    [^.!,?;"\\'()\[\]\{\}\s\x7F-\\xFF]*
    (
      [.!,?]+ [^.!,?;"\\'()\\[\\]\{\\}\s\\x7F-\\xFF]+
    )*
  )?
}ix

It's not written by me, I'm not quite sure where I got it from, sorry that I can give no credit...

answered Oct 03 '22 01:10

fresskoma

this should get you email addresses:

$string = "bah bah [email protected] foo";
$match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array);
print_r($array);

// outputs:
Array
(
    [0] => [email protected]
)

answered Oct 04 '22 01:10

Stephen Fuhry

I know this answer has been accepted and that this question is quite old, but it can be useful for other people looking for other implementations.

This is a modified version of the code posted by: Angel.King.47 on July 27,09:

$text = preg_replace(
 array(
   '/(^|\s|>)(www.[^<> \n\r]+)/iex',
   '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex',
   '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex'
 ),  
 array(
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))",
 ),  
 $text
);

Changes:

I removed rules #2 and #3 (I'm not sure in which situations are useful).
Removed email parsing as I really don't need it.
I added one more rule which allows the recognition of URLs in the form: [domain]/* (without www). For example: "example.com/faq/" (Multiple tld: domain.{2-3}.{2-4}/)
When parsing strings starting with "http://", it removes it from the link label.
Added "target='_blank'" to all links.
Urls can be specified just after any(?) tag. For example: www.example.com

As "Søren Løvborg" has stated, this function does not escape the URLs. I tried his/her class but it just didn't work as I expected (If you don't trust your users, then try his/her code first).

answered Oct 02 '22 01:10

lepe

As I mentioned in one of the comments above my VPS, which is running php 7, started emitting warnings Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead. The buffer after the replacement was empty/false.

I have rewritten the code and made some improvements. If you think that you should be in the author section feel free to edit the comment above the function make_links_blank name. I am intentionally not using the closing php ?> to avoid inserting whitespace in the output.

<?php

class App_Updater_String_Util {
    public static function get_default_link_attribs( $regex_matches = [] ) {
        $t = ' target="_blank" ';
        return $t;
    }

    /**
     * App_Updater_String_Util::set_protocol();
     * @param string $link
     * @return string
     */
    public static function set_protocol( $link ) {
        if ( ! preg_match( '#^https?#si', $link ) ) {
            $link = 'http://' . $link;
        }
        return $link;
    }

/**
     * Goes through text and makes whatever text that look like a link an html link
     * which opens in a new tab/window (by adding target attribute).
     * 
     * Usage: App_Updater_String_Util::make_links_blank( $text );
     * 
     * @param str $text
     * @return str
     * @see http://stackoverflow.com/questions/1188129/replace-urls-in-text-with-html-links
     * @author Angel.King.47 | http://dashee.co.uk
     * @author Svetoslav Marinov (Slavi) | http://orbisius.com
     */
    public static function make_links_blank( $text ) {
        $patterns = [
            '#(?(?=<a[^>]*>.+?<\/a>)
                 (?:<a[^>]*>.+<\/a>)
                 |
                 ([^="\']?)((?:https?|ftp):\/\/[^<> \n\r]+)
             )#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = empty( $r2 ) ? '' : App_Updater_String_Util::set_protocol( $r2 );
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
             },

            '#(^|\s)((?:https?://|www\.|https?://www\.)[^<>\ \n\r]+)#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = ! empty( $r2 ) ? App_Updater_String_Util::set_protocol( $r2 ) : '';
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
            },

            // Remove any target attribs (if any)
            '#<a([^>]*)target="?[^"\']+"?#si' => '<a\\1',

            // Put the target attrib
            '#<a([^>]+)>#si' => '<a\\1 target="_blank">',

            // Make emails clickable Mailto links
            '/(([\w\-]+)(\\.[\w\-]+)*@([\w\-]+)
                (\\.[\w\-]+)*)/six' => function ( $matches ) {

                $r = $matches[0];
                $res = ! empty( $r ) ? "<a href=\"mailto:$r\">$r</a>" : $r;
                $res = stripslashes( $res );

                return $res;
            },
        ];

        foreach ( $patterns as $regex => $callback_or_replace ) {
            if ( is_callable( $callback_or_replace ) ) {
                $text = preg_replace_callback( $regex, $callback_or_replace, $text );
            } else {
                $text = preg_replace( $regex, $callback_or_replace, $text );
            }
        }

        return $text;
    }
}

answered Oct 02 '22 01:10

Svetoslav Marinov

Related questions
                            
                                Get method's arguments?
                            
                                Fatal error while upgrading Laravel 5.1 to 5.2
                            
                                How to parse JSON and access results
                            
                                php call class function by string name
                            
                                convert date string to mysql datetime field
                            
                                php.ini reload in php-cli
                            
                                How to remove both .php and .html extensions from url using NGINX?
                            
                                Getting GET "?" Variable in Laravel
                            
                                Set Font Color, Font Face and Font Size in PHPExcel
                            
                                Tell bots apart from human visitors for stats?
                            
                                Call method by string?
                            
                                Initializing Multiple PHP Variables Simultaneously
                            
                                Setting timezone to UTC (0) in PHP
                            
                                How to chain method on a newly created object?
                            
                                Current user in Magento?
                            
                                Laravel 5 not finding css files
                            
                                Laravel unique validation on multiple columns
                            
                                PHP - If/else, for, foreach, while - without curly braces?
                            
                                Mysql transactions within transactions
                            
                                How to setup CRON job to run every 10 seconds in Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace URLs in text with HTML links

Tags:

regex

url

php

preg-replace

linkify

Angel.King.47

People also ask

9 Answers

Søren Løvborg

Angel.King.47

Raheel Hasan

Armand

Dharmendra Jadon

fresskoma

Stephen Fuhry

lepe

Svetoslav Marinov

Recent Activity

Donate For Us