I got a doubt in mind that which one is faster in processing?
dom document or preg_match_all with curl function is faster in html page parsing?? and will dom document function leave a trace on other server like curl function do? For example in curl function we use a user agent to define who is accessing but in dom document there is nothing.
Does it matter which is faster if one gives you incorrect results?
Matching with regular expressions to get a single bit of data out of the document will be faster than parsing an entire HTML document. But regular expressions cannot parse HTML correctly in all cases.
See http://htmlparsing.com/regexes.html, which I have started to address this common question. (And for the rest of you reading this, I can use help. The source is on github, and I need examples for many different languages.)
Regular expressions will likely be faster, but they are also likely the worse choice. Unless you have benchmarked and profiled your application and found nothing else to optimize, you should look into a proper existing parser.
While Regular Expressions can be used to match HTML, it takes a thorough effort to come up with a reliable parser. PHP offers a bunch of native extensions to work with XML (and HTML) reliably. There is also a number of third party libraries. See my answer to
As for sending a custom user agent, this is possible with DOM too. You have to create a custom stream context and attach it with the underlying libxml functions. You can supply any of the available HTTP Stream context options this way. See my answer to
for an example how to supply a custom UserAgent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With