Im simply trying to get all the <td> elements data residing inside <tr> elements. My problem is because of the table structure im trying to scrape I need to exclude all elements with attribute COLLSPAN i.e <td collspan = 12>
Getting the table data is simple enough as can be seen from below code but because of the table structure I need to exclude all collspan attributes.
<?php
$html = file_get_contents('http://www.superxv.com/fixtures/'); //get the html returned from the following url
$game_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)) { //if any html is actually returned
$game_doc->loadHTML($html);
libxml_clear_errors(); //remove error
$xpath = new DOMXPath($game_doc);
// Modify the XPath query to match the content
foreach ($xpath->query('//table')->item(0)->getElementsByTagName('tr') as $rows) {
$cells = $rows->getElementsByTagName('td');
//$cells2 = $rows->getElementsByTagName('th');
echo '<pre>';
//@ signs are added due to table structure
//Get scrapped columns
echo $dayDateBye[] = $cells->item(0)->textContent;
echo $homeTeam[] = $cells->item(1)->textContent;
echo $awayTeam[] = $cells->item(2)->textContent;
echo $venue[] = $cells->item(3)->textContent;
echo $timeGMT[] = $cells->item(5)->textContent;
echo $timeZA[] = $cells->item(10)->textContent;
echo '</pre>';
}
}
Here you can see the table structure it shows 5 odd rows of fixtures and then changes structure when the new week starts. The elements I can identify to skip over this change of structure is all <td collspan = 12> elements. Which makes it tricky since the TD elements does not have a class name only the element to identify it with.


Any input appreciated.
You can skip those by length of the tag
<?php
$html = file_get_contents('http://www.superxv.com/fixtures/'); //get the html returned from the following url
$game_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)) { //if any html is actually returned
$game_doc->loadHTML($html);
libxml_clear_errors(); //remove error
$xpath = new DOMXPath($game_doc);
// Modify the XPath query to match the content
foreach ($xpath->query('//table')->item(0)->getElementsByTagName('tr') as $rows) {
$cells = $rows->getElementsByTagName('td');
if( $cells->length > 1 ){
//$cells2 = $rows->getElementsByTagName('th');
echo '<pre>';
//@ signs are added due to table structure
//Get scrapped columns
echo $dayDateBye[] = $cells->item(0)->textContent;
echo $homeTeam[] = $cells->item(1)->textContent;
echo $awayTeam[] = $cells->item(2)->textContent;
echo $venue[] = $cells->item(3)->textContent;
echo $timeGMT[] = $cells->item(5)->textContent;
echo $timeZA[] = $cells->item(10)->textContent;
echo '</pre>';
}
}
}
?>
colspan attributesSo instead of:
$cells = $rows->getElementsByTagName('td');
Use:
$cells = $xpath->query('td[not(@colspan)]', $rows);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With