Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Nokogiri and XPath to get nodes with multiple attributes

I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric markup. Specifically, I'm trying to grab divs which have both ids, multiple classes and styles defined.

The markup looks something like this:

<div id="foo">
  <div id="bar" class="baz bang" style="display: block;">
    <h2>title</h2>
    <dl>
      List of stuff
    </dl>
  </div>
</div>

I'm attempting to grab the <dl> which sits inside the problem <div>. I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes.

So these work fine:

content = @doc.xpath("//div[id='foo']")
content = @doc.css('div#foo')

But these don't return anything:

content = @doc.xpath("//div[id='bar']")
content = @doc.xpath("div#bar")

Is there something obvious that I'm missing here?

like image 476
TimD Avatar asked Aug 29 '10 01:08

TimD


3 Answers

I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes.

You want:

//div[id='bar' and class='baz bang' and style='display: block;']
like image 82
Dimitre Novatchev Avatar answered Oct 24 '22 20:10

Dimitre Novatchev


The following works for me.

require 'rubygems'
require 'nokogiri'

html = %{
<div id="foo">
  <div id="bar" class="baz bang" style="display: block;">
    <h2>title</h2>
    <dl>
      List of stuff
    </dl>
  </div>
</div>
}

doc = Nokogiri::HTML.parse(html)
content = doc
  .xpath("//div[@id='foo']/div[@id='bar' and @class='baz bang']/dl")
  .inner_html

puts content
like image 45
AboutRuby Avatar answered Oct 24 '22 18:10

AboutRuby


I think content = @doc.xpath("div#bar") is a typo and should be content = @doc.css("div#bar") or better content = @doc.css("#bar"). The first expression in your second code chunk seems to be ok.

like image 40
Daniel O'Hara Avatar answered Oct 24 '22 20:10

Daniel O'Hara