Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences in query algorithms between XPath and CSS

I'm wondering why someone would want to use CSS selectors rather than XPath selectors, or vice-versa, if he could use either one. I think that understanding the algorithms that process the languages will resolve my wonder.

There's a lot of documentation on XPath and CSS selectors individually, but I've found very few comparisons. Also, I don't use CSS selectors that much.

Here's what I've read about the differences. (These three references discuss the use of XPath and CSS selectors in Selenium to query HTML, but my wonder is general.)

  • XPath allows traversal from child to parent
  • CSS selectors have features specific to HTML
  • CSS selectors are faster when you're using Internet Explorer in Selenium

It looks like CSS selection algorithms are somehow optimized for HTML, but I don't know how.

  1. Is there a paper on how CSS and XPath query algorithms work and how they differ?
  2. Are there other abstract differences between the languages that I'm missing?
like image 359
Thomas Levine Avatar asked Nov 15 '11 18:11

Thomas Levine


1 Answers

The main difference is in how stable is the document structure you target:

  1. XPath is a good query language when the structure matters and/or is stable. You usually specify path, conditions, exact offset... it is also a good query language to retrieve a set of similar objects and because of that, it has an intimate relationship with XQuery. Here the document has a stable structure and you must retrieve repeated/similar sections

  2. CSS selectors suits better CSS stylesheets. These do not care about the document structure because this changes a lot. Think of one CSS stylesheet applied to all the HTML pages of a website. The content and structure of every page is different. Here CSS selectors are better because of that changing structure. You will notice that access is more tag based. Most CSS syntax specify a set of elements, attributes, id, classes... and not so much their structure. Here you must locate sections that do not have a clear location within a document structure but are marked with certain attributes.


Update: After a closer look to your question I realized that you are more interested in the current implementation, not the nature of the the query languages. In that case I cannot give you the answer you are looking for. I can only suppose that the reason is still that one is more dependent on the structure than the other.

For example, in XPath you must keep track of the structure of the document you are working on. On the other hand CSS selectors are triggered when a specific tag shows up, and it usually does not matter what came before it. I can imagine that it will be much easier to implement a CSS selector algorithm that work as you read a document, while XPath has more cases where you really need the full document and/or strict track of what it is reading (because the history and background of what you are reading is more important)

Now, do not take me too serious on my update. I am only guessing here because I had some background on language parsing, but I actually do not have experience with the ones designed for data querying.

like image 123
SystematicFrank Avatar answered Oct 12 '22 19:10

SystematicFrank