Regex to find all URL and titles

Question

I would like to extract all the urls and titles from a paragraph of text.

Les <a href="http://test.com/blop" class="c_link-blue">résultats du sondage</a> sur les remakes et suites souhaités sont <a href="http://test.com" class="c_link-blue">dans le blog</a>.

I am able to get all the href thanks to the following regex, but I don't know how to get in addition, the title between the <a></a> tags ?

preg_match_all('/<a.*href="?([^" ]*)" /iU', $v['message'], $urls);

The best would be to get an associative array like that

[0] => Array
(
   [title] => XXX
   [link] => http://test.com/blop
)
[1] => Array
(
   [title] => XXX
   [link] => http://test.com
)

Thanks for your help

Marcus · Accepted Answer

If you still insist on using regex to solve this problem you might be able to parse some with this regex:

<a.*?href="(.*?)".*?>(.*?)</a>

Note that it doesn't use the U modifier as your did.

Update: To have it accept single qoutes, as well as double quotes, you can use the following pattern instead:

<a.*?href=(?:"(.*?)"|'(.*?)').*?>(.*?)</a>

VolkerK · Answer

As has been mentioned in the comments don't use a regular expression but a DOM parser.
E.g.

<?php
$doc = new DOMDocument;
$doc->loadhtml( getExampleData() );

$xpath = new DOMXPath($doc);
foreach( $xpath->query('/html/body/p[@id="abc"]//a') as $node ) {
    echo $node->getAttribute('href'), ' - ' , $node->textContent, "
";
}

function getExampleData() {
    return '<html><head><title>...</title></head><body>
    <p>
        not <a href="wrong">this one</a> but ....
    </p>
    <p id="abc">
        Les <a href="http://test.com/blop" class="c_link-blue">résultats du sondage</a> sur les remakes et suites souhaités sont <a href="http://test.com" class="c_link-blue">dans le blog</a>.
    </p>
    </body></html>';
}

see http://docs.php.net/DOMDocument and http://docs.php.net/DOMXPath

Regex to find all URL and titles

Tags:

regex

url

php

Simon Taisne

2 Answers

Marcus

VolkerK

Recent Activity

Donate For Us

Regex to find all URL and titles

Tags:

regex

url

php

Simon Taisne

2 Answers

Marcus

VolkerK

Related questions

Recent Activity

Donate For Us