Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple HTML DOM Parser- Scraping html content that has no id or class

I'm scraping a webpage for values and storing them in an array, at the moment i can pull in all td.Place values because it has a class.

Note: I'm using Simple HTML DOM Parser

My current code that works:

<?php 

include('simple_html_dom.php');
$html = file_get_html('http://www...');

// initialize empty array to store the data array from each row
$theData3 = array();

// initialize array to store the cell data from each row
$rowData3 = arra

foreach($row->find('td.Place') as $cell) 
{

// push the cell's text to the array
$rowData3[] = $cell->innertext;

}
// push the row's data array to the 'big' array
$theData3[] = $rowData3;

}

print_r($theData3);
 ?>

What's the Issue?

I want to pull in the values 100 & - 3 in class="Grad.** The first two td's within class="Grad*. Because the two TD values have no id or class I'm finding it difficult.

This is the html I am currently scraping

<tr class="PersonrRow odd">
        <td></td>
        <td class="place">T9</td>
        <td>
        <span class="rank"></span>16</td>
        <td class="Grad">-7
        </td>
        <td>
        100
        </td>
        <td>
        -3
        </td>
        <td>
        712
        </td>
        <td>
        682
        </td>
        <td>
        702
        </td>
        <td>
        68
        </td>
        <td class="person large"></td>
        <td style="">
        277
        </td>
    </tr>
like image 660
Helena Avatar asked Dec 11 '25 20:12

Helena


1 Answers

Okay, so after doing some research and digging around my old files here is what I have come up with for you. You're not going to need any fancy plugins or anything just the php DOMDocument:

php

<?php
    $thedata3 = array();
    $rowdata3 = array();
    $DOM = new DOMDocument();
    $DOM->loadHTMLFile("file path or url");

    // get the actual table itself
    $xpath = new DOMXPath($DOM);
    $table = $xpath->query('//table[@id="tableID"]')->item(0);


    $rows = $table->getElementsByTagName("tr");

    for ($i = 0; $i < $rows->length; $i++) {
        $cols = $rows->item($i)->getElementsbyTagName("td");
        for ($j = 0; $j < $cols->length; $j++) {

          //change $cols->item($j) $cols->item('insert column number here')
          // that will give you the proper column you're after
           array_push($rowdata3, $cols->item($j)->nodeValue);
        }
        array_push($thedata3, $rowdata3);
        $rowdata3 = array(); //empty the $rowdata3 array for fresh results
    }
?>

This is the best I can do with what you've provided me but I hope it helps in some way, please let me know if you need any more assistance.

For ease of access and readability. I would recommend just throwing everything into the associative array like you planned, and then after you have scraped all the data. Manipulate the array data and pull what you want from it. That should be easier.

references

PHP.net DOMDocument http://php.net/manual/en/class.domdocument.php

PHP.net DOMXPath http://php.net/manual/en/class.domxpath.php

This link here has all the references to the DOMDocument and DOMXPath classes. This will have everything you need to get you started!

like image 198
Mark Hill Avatar answered Dec 13 '25 10:12

Mark Hill



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!