I am trying to extract multiple URLs from HTML file with regex. There are other URLs in the file, do the only pattern i have is "tableentries." and ""
HTML code example:
<tr class="tableentries2">
<td>
<a href="http://example.com/all-files/files/00000000789/">Click Here</a>
</td>
PHP I wrote:
$html = "value of the code above"
if(preg_match_all('/<td>.*</td>/', $html, $match)){
foreach($match[0] as $x){
echo $x . "<br>";
}}
Why not just look for href values? (Updated because the edited code now has quotation marks.)
preg_match_all('/href="([^\s"]+)/', $html, $match);
Then the URI would be in $match[1][0].
You really shouldn't use regex to parse HTML. DOMDocument is actually very easy to use for this type of thing. here is a simple example.
<?php
error_reporting(E_ALL);
$html = "
<table>
<tr>
<td>
<a href='http://www.test1-1.com'>test1-1</a>
</td>
<td>
<a href='http://www.test1-2.com'>test1-2</a>
</td>
<td>
<a href='http://www.test1-3.com'>test1-3</a>
</td>
</tr>
<tr>
<td>
<a href='http://www.test2-1.com'>test2-1</a>
</td>
<td>
<a href='http://www.test2-2.com'>test2-2</a>
</td>
<td>
<a href='http://www.test2-3.com'>test2-3</a>
</td>
</tr>
</table>";
$DOM = new DOMDocument();
//load the html string into the DOMDocument
$DOM->loadHTML($html);
//get a list of all <A> tags
$a = $DOM->getElementsByTagName('a');
//loop through all <A> tags
foreach($a as $link){
//echo out the href attribute of the <A> tag.
echo $link->getAttribute('href').'<br />';
}
?>
This would output:
http://www.test1-1.com
http://www.test1-2.com
http://www.test1-3.com
http://www.test2-1.com
http://www.test2-2.com
http://www.test2-3.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With