Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Dom document html is faster or preg_match_all function is faster?

Tags:

dom

php

I got a doubt in mind that which one is faster in processing?

dom document or preg_match_all with curl function is faster in html page parsing?? and will dom document function leave a trace on other server like curl function do? For example in curl function we use a user agent to define who is accessing but in dom document there is nothing.

like image 570
mathew Avatar asked Jan 20 '23 19:01

mathew


2 Answers

Does it matter which is faster if one gives you incorrect results?

Matching with regular expressions to get a single bit of data out of the document will be faster than parsing an entire HTML document. But regular expressions cannot parse HTML correctly in all cases.

See http://htmlparsing.com/regexes.html, which I have started to address this common question. (And for the rest of you reading this, I can use help. The source is on github, and I need examples for many different languages.)

like image 172
Andy Lester Avatar answered Jan 26 '23 00:01

Andy Lester


Regular expressions will likely be faster, but they are also likely the worse choice. Unless you have benchmarked and profiled your application and found nothing else to optimize, you should look into a proper existing parser.

While Regular Expressions can be used to match HTML, it takes a thorough effort to come up with a reliable parser. PHP offers a bunch of native extensions to work with XML (and HTML) reliably. There is also a number of third party libraries. See my answer to

  • Best Methods to parse HTML

As for sending a custom user agent, this is possible with DOM too. You have to create a custom stream context and attach it with the underlying libxml functions. You can supply any of the available HTTP Stream context options this way. See my answer to

  • DOMDocument::validate() problem

for an example how to supply a custom UserAgent.

like image 41
Gordon Avatar answered Jan 25 '23 22:01

Gordon