Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php regex to get string inside href tag

I need a regex that will give me the string inside an href tag and inside the quotes also.

For example i need to extract theurltoget.com in the following:

<a href="theurltoget.com">URL</a>

Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.html i only want http://www.mydomain.com/

like image 882
David Avatar asked Oct 22 '10 22:10

David


4 Answers

Dont use regex for this. You can use xpath and built in php functions to get what you want:

    $xml = simplexml_load_string($myHtml);
    $list = $xml->xpath("//@href");

    $preparedUrls = array();
    foreach($list as $item) {
        $item = parse_url($item);
        $preparedUrls[] = $item['scheme'] . '://' .  $item['host'] . '/';
    }
    print_r($preparedUrls);
like image 54
Drew Hunter Avatar answered Nov 06 '22 15:11

Drew Hunter


$html = '<a href="http://www.mydomain.com/page.html">URL</a>';

$url = preg_match('/<a href="(.+)">/', $html, $match);

$info = parse_url($match[1]);

echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
like image 42
Alec Avatar answered Nov 06 '22 17:11

Alec


this expression will handle 3 options:

  1. no quotes
  2. double quotes
  3. single quotes

'/href=["\']?([^"\'>]+)["\']?/'

like image 7
ishubin Avatar answered Nov 06 '22 16:11

ishubin


Use the answer by @Alec if you're only looking for the base url part (the 2nd part of the question by @David)!

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);

This will give you:

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html" class="myclass" rel="myrel
)

So you can use $href = $info["scheme"] . "://" . $info["host"] Which gives you:

// http://www.mydomain.com  

When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by @user2520237.

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

this will give you:

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html
)

Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"]; Which gives you:

// http://www.mydomain.com/page.html
like image 7
Linkmichiel Avatar answered Nov 06 '22 17:11

Linkmichiel