I would like to know if Nokogiri XPath or CSS parsing works faster with HTML files. How is the speed different?
Nokogiri doesn't have XPath or CSS parsing. It parses XML/HTML into a single DOM that you can then use CSS or XPath syntax to query.
CSS selectors are internally turned into XPath before asking libxml2 to perform the query. As such (for the exact same selectors) the XPath version would be a tiny fraction faster, since the CSS does not need to be converted into XPath first.
However, your question has no general answer; it depends on what you are selecting for, and what your XPath looks like. Chances are, you wouldn't be writing the same XPath as Nokogiri creates. For example, see if you can guess the XPath for the following two CSS statements:
puts Nokogiri::CSS.xpath_for('#foo')
#=> //*[@id = 'foo']
puts Nokogiri::CSS.xpath_for 'div.article a.external'
#=> //div[contains(concat(' ', @class, ' '), ' article ')]//a[contains(concat(' ', @class, ' '), ' external ')]
Unlike a Web browser, id
and class
attributes have no sped-up cache, so selecting for them does not help. Indeed, the general interpretation of div.article
involves far more work than something like div[@class='article']
.
As @LBg commented, you should benchmark for yourself if absolute speed is critical.
However, I would suggest this: don't worry about it. Computers are fast. Write what is most convenient for you, the programmer. If a CSS selector is easier to craft, faster to type, and easier to understand when reviewing your code later, use that. Use XPath when you need to do things that you cannot do with the CSS selector syntax.
How long does it take Nokogiri to convert a reasonably complex CSS to XPath?
t = Time.now
1000.times do |i|
# Use a different CSS string each time to avoid built-in caching
css = "body#foo table#bar#{i} thead th, body#foo table#bar#{i} tbody td"
Nokogiri::CSS.xpath_for(css)
end
puts (Time.now - t)/1000
#=> 0.000405041
Less than half a millisecond.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With