Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing web address regular expression

Tags:

regex

php

I found the following online but I'm having trouble implementing it

(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

This is what I want the php to do:

Take the following : Look here: http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

And turn it into: Look here: <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlanguages.com/span...anish_accents.php</a>

If the URL is long then the a text gets broken down with a ... in the middle

like image 795
Jake Avatar asked Nov 12 '22 22:11

Jake


1 Answers

Try this:

// URL regex from here:
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls
define( 'URL_REGEX', <<<'_END'
~(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))~
_END
);

// PHP 5.3 or higher, can use closures (anonymous functions)
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = function( $matches ) use ( $length, $elision_string) {
        $matched_url = $matches[ 0 ];
        return '<a href="' . $matched_url . '">' .
                abbreviated_url( $matched_url, $length, $elision_string )   .
                '</a>';
    };
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

function abbreviated_url( $url, $length = 50, $elision_string = '...' ) {
    if ( strlen( $url ) <= $length ) {
        return $url;
    }
    $width_either_side = (int) ( ( $length - strlen( $elision_string ) ) / 2 );
    $left  = substr( $url, 0, $width_either_side );
    $right = substr( $url, strlen( $url ) - $width_either_side );

    return $left . $elision_string . $right;
}

(The backtick in the URL_REGEX definition confuses stackoverflow.com's syntax highlighting, but it's nothing to be concerned about)

The function replace_urls_with_anchor_tags takes a string and changes all the URLs matched within to anchor tags, shortening long URLs by eliding with ellipses. The function takes optional length and elision_string arguments in case you wish to play around with the length and change the ellipses to something else.

Here's a usage example:

// Test it out
$test = <<<_END
Look here:
http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php

And here:
http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression
_END;

echo replace_urls_with_anchor_tags( $test, 50, '...' );
// OUTPUT:
// Look here:
// <a href="http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php">http://www.rocketlangua...ion_spanish_accents.php</a>
//
// And here:
// <a href="http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression">http://stackoverflow.co...ress-regular-expression</a>

Note that if you are using PHP 5.2 or lower you must rewrite replace_urls_with_anchor_tags to use create_function instead of closures. Closures were not introduced until PHP 5.3:

// No closures in PHP 5.2, must use create_function()
function replace_urls_with_anchor_tags( $string,
                                        $length = 50,
                                        $elision_string = '...' ) {
    $replace_function = create_function(
        '$matches',
        'return "<a href=\"$matches[0]\">" .
                abbreviated_url( $matches[ 0 ], '            .
                                 $length  . ', '             .
                                 '"' . $elision_string . '"' .
                               ') . "</a>";'
    );
    return preg_replace_callback(
        URL_REGEX,
        $replace_function,
        $string
    );
}

Note that I replaced the URL regex you had found with one linked to on the page DaveRandom referred to in his comment. It's more complete, and in fact there is actually a mistake in the regex you were using -- a couple of '/' characters are not escaped (in here: [\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#]). Also, it doesn't detect port numbers like 80 or 8080.

Hope this helps.

like image 50
David Avatar answered Nov 15 '22 11:11

David