PHP Regex HTML - Extract URL

Question

I am trying to extract multiple URLs from HTML file with regex. There are other URLs in the file, do the only pattern i have is "tableentries." and ""

HTML code example:

<tr class="tableentries2">
  <td>
    <a href="http://example.com/all-files/files/00000000789/">Click Here</a>
  </td>

PHP I wrote:

$html = "value of the code above"
if(preg_match_all('/<td>.*</td>/', $html, $match)){
foreach($match[0] as $x){

echo $x . "<br>";

}}

sdleihssirhc · Accepted Answer

Why not just look for href values? (Updated because the edited code now has quotation marks.)

preg_match_all('/href="([^\s"]+)/', $html, $match);

Then the URI would be in $match[1][0].

Jonathan Kuhn · Answer

You really shouldn't use regex to parse HTML. DOMDocument is actually very easy to use for this type of thing. here is a simple example.

<?php
error_reporting(E_ALL);
$html = "
<table>
    <tr>
        <td>
            <a href='http://www.test1-1.com'>test1-1</a>
        </td>
        <td>
            <a href='http://www.test1-2.com'>test1-2</a>
        </td>
        <td>
            <a href='http://www.test1-3.com'>test1-3</a>
        </td>
    </tr>
    <tr>
        <td>
            <a href='http://www.test2-1.com'>test2-1</a>
        </td>
        <td>
            <a href='http://www.test2-2.com'>test2-2</a>
        </td>
        <td>
            <a href='http://www.test2-3.com'>test2-3</a>
        </td>
    </tr>
</table>";

$DOM = new DOMDocument();
//load the html string into the DOMDocument
$DOM->loadHTML($html);
//get a list of all <A> tags
$a = $DOM->getElementsByTagName('a');
//loop through all <A> tags
foreach($a as $link){
    //echo out the href attribute of the <A> tag.
    echo $link->getAttribute('href').'<br />';
}
?>

This would output:

http://www.test1-1.com
http://www.test1-2.com
http://www.test1-3.com
http://www.test2-1.com
http://www.test2-2.com
http://www.test2-3.com

PHP Regex HTML - Extract URL

Tags:

html

regex

php

Rajesh Muntari

2 Answers

sdleihssirhc

Jonathan Kuhn

Recent Activity

Donate For Us

PHP Regex HTML - Extract URL

Tags:

html

regex

php

Rajesh Muntari

2 Answers

sdleihssirhc

Jonathan Kuhn

Related questions

Recent Activity

Donate For Us