Here is an example of the HTML I need to parse into a PHP program:
<div id="dump-list">
<div class="dump-row">
<div class="dump-location odd" data-jmapping="{id: 35, point: {lng: -73.00898601, lat: 41.71727402}, category: 'office'}">
<div class="SingleLinkNoTx">
<a href="#10" class="loc-link">Acme Software</a><br/><strong>John Doe, MBA</strong><br/>123 Main St.<br />New York, NY 10036<br /><strong class="telephone">(212) 555-1234</strong><br/>
</div><!-- END.SingleLinkNoTx -->
<a href="http://www.example.com" target="_blank" class="web_link">Visit Website</a><span><br />(0.3 miles)</span>
<div class="loc-info">
<div class="loc-info-text ">
John Doe, MBA<br /><a href="http://maps.google.com/?daddr=41.71727402,-73.00898601" target="_blank">Get Directions »</a>
</div>
</div>
</div>
This is the information I want to extract from the above HTML example into PHP:
lng: -73.00898601, lat: 41.71727402
category: 'office'
Acme Software
John Doe, MBA
123 Main St.
New York, NY 10036
(212) 555-1234
http://www.example.com
I have tried using PHP Simple HTML DOM Parser, but I'm new to it and can't find a working PHP example that pertains to what I need to do. I tried some PHP code like this to understand how this works, but the var_dump($e) produces huge amounts of output and has messages in the var_dump about recursion. So I'm lost how to really use this. Greatly appreciate some kind help!
$e = $html->find('.dump-location', 0)->find('.SingleLinkNoTx', 0);
echo $e;
var_dump($e);
Use XPath to find and extract elements in an HTML/XML document - specifically the SimpleXMLElement::xpath method.
The following example will find the telephone number for a location:
$doc = new DOMDocument();
$doc->loadHTML('your html snippet goes here - or use loadHTMLFile()');
$xml = simplexml_import_dom($doc);
$elements = $xml->xpath('//*[contains(@class, "dump-location")]/div[@class="SingleLinkNoTx"]/strong[@class="telephone"]');
print_r($elements);
The most complex part is the XPath expression. A quick breakdown:
//*[contains(@class, "dump-location")]dump-location class/dump-location parent.div[@class="SingleLinkNoTx"]DIV element that has a SingleLinkNoTx class (and no other class name).strongSTRONG tags with a telephone class.Using this XPath expression on the HTML snippet provided in the question will result in output like the following. Which is fairly easy to iterate and extract information from:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => telephone
)
[0] => (212) 555-1234
)
)
If you know the document structure it's possible to construct an XPath expression for each piece of information you want to extract. Or, it might be simpler to use a more general XPath expression (say, an expression that retrieves all dump-location elements) and manually iterate though the elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With