Found this one http://simplehtmldom.sourceforge.net/ but it has failed to work
extracting this page http://php.net/manual/en/function.curl-setopt.php
and parse it to plain html, it failed and returned a partial html page
This is what I want to do, Go to a html page and get the components individual( the contents of all div and p in a hierarchy ) I like the features of simplehtmldom any such parser is required which is good at all code(best and worst).
I often use DOMDocument::loadHTML
, which works not too bad, in the general cases -- and I like querying the documents, once they are loaded as DOM, with Xpath
.
Unfortunatly, I suppose that, in some cases, if the HTML page is really to badly-formed, some parsing problems can occur... That's when you start understanding that respecting web-standards is a great idea...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With