Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster, XPath or Regexp?

I am making an add-on for firefox and it loads a html page using ajax (add-on has it's XUL panel).

Now at this point, i did not search for a ways of creating a document object and placing the ajax request contents into it and then using xPath to find what i need.
Instead i am loading the contents and parsing it as text with regular expresion.

But i got a question. Which would be better to use, xPath or regular expression? Which is faster to perform?

The HTML page would consist of hundreds of elements which contain same text, and what i basically want to do is count how many elements are there.

I want my add-on to work as fast as possible and i do not know the mechanics behind regexp or xPath, so i don't know which is more effective.

Hope i was clear. Thanks

like image 656
user1651105 Avatar asked Aug 04 '10 13:08

user1651105


1 Answers

Whenever you are dealing with XML, use XPath (or XSLT, XQuery, SAX, DOM or any other XML-aware method to go through your data). Do never use regular expressions for this task.

Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.

Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.

If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).

You wrote:

"which is more effective"*

the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox and other browsers as well if you need your code to run from a browser.

like image 51
Abel Avatar answered Sep 30 '22 20:09

Abel