Simple HTML DOM Parser- Scraping html content that has no id or class

Question

I'm scraping a webpage for values and storing them in an array, at the moment i can pull in all td.Place values because it has a class.

Note: I'm using Simple HTML DOM Parser

My current code that works:

<?php 

include('simple_html_dom.php');
$html = file_get_html('http://www...');

// initialize empty array to store the data array from each row
$theData3 = array();

// initialize array to store the cell data from each row
$rowData3 = arra

foreach($row->find('td.Place') as $cell) 
{

// push the cell's text to the array
$rowData3[] = $cell->innertext;

}
// push the row's data array to the 'big' array
$theData3[] = $rowData3;

}

print_r($theData3);
 ?>

What's the Issue?

I want to pull in the values 100 & - 3 in class="Grad.** The first two td's within class="Grad*. Because the two TD values have no id or class I'm finding it difficult.

This is the html I am currently scraping

<tr class="PersonrRow odd">
        <td></td>
        <td class="place">T9</td>
        <td>
        <span class="rank"></span>16</td>
        <td class="Grad">-7
        </td>
        <td>
        100
        </td>
        <td>
        -3
        </td>
        <td>
        712
        </td>
        <td>
        682
        </td>
        <td>
        702
        </td>
        <td>
        68
        </td>
        <td class="person large"></td>
        <td style="">
        277
        </td>
    </tr>

Mark Hill · Accepted Answer

Okay, so after doing some research and digging around my old files here is what I have come up with for you. You're not going to need any fancy plugins or anything just the php DOMDocument:

php

<?php
    $thedata3 = array();
    $rowdata3 = array();
    $DOM = new DOMDocument();
    $DOM->loadHTMLFile("file path or url");

    // get the actual table itself
    $xpath = new DOMXPath($DOM);
    $table = $xpath->query('//table[@id="tableID"]')->item(0);


    $rows = $table->getElementsByTagName("tr");

    for ($i = 0; $i < $rows->length; $i++) {
        $cols = $rows->item($i)->getElementsbyTagName("td");
        for ($j = 0; $j < $cols->length; $j++) {

          //change $cols->item($j) $cols->item('insert column number here')
          // that will give you the proper column you're after
           array_push($rowdata3, $cols->item($j)->nodeValue);
        }
        array_push($thedata3, $rowdata3);
        $rowdata3 = array(); //empty the $rowdata3 array for fresh results
    }
?>

This is the best I can do with what you've provided me but I hope it helps in some way, please let me know if you need any more assistance.

For ease of access and readability. I would recommend just throwing everything into the associative array like you planned, and then after you have scraped all the data. Manipulate the array data and pull what you want from it. That should be easier.

references

PHP.net DOMDocument http://php.net/manual/en/class.domdocument.php

PHP.net DOMXPath http://php.net/manual/en/class.domxpath.php

This link here has all the references to the DOMDocument and DOMXPath classes. This will have everything you need to get you started!

Simple HTML DOM Parser- Scraping html content that has no id or class

Tags:

html

dom

php

parsing

What's the Issue?

This is the html I am currently scraping

Helena

1 Answers

Mark Hill

Recent Activity

Donate For Us

Simple HTML DOM Parser- Scraping html content that has no id or class

Tags:

html

dom

php

parsing

What's the Issue?

This is the html I am currently scraping

Helena

1 Answers

Mark Hill

Related questions

Recent Activity

Donate For Us