I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric markup. Specifically, I'm trying to grab divs which have both ids, multiple classes and styles defined.
The markup looks something like this:
<div id="foo">
<div id="bar" class="baz bang" style="display: block;">
<h2>title</h2>
<dl>
List of stuff
</dl>
</div>
</div>
I'm attempting to grab the <dl>
which sits inside the problem <div>
. I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes.
So these work fine:
content = @doc.xpath("//div[id='foo']")
content = @doc.css('div#foo')
But these don't return anything:
content = @doc.xpath("//div[id='bar']")
content = @doc.xpath("div#bar")
Is there something obvious that I'm missing here?
I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes.
You want:
//div[id='bar' and class='baz bang' and style='display: block;']
The following works for me.
require 'rubygems'
require 'nokogiri'
html = %{
<div id="foo">
<div id="bar" class="baz bang" style="display: block;">
<h2>title</h2>
<dl>
List of stuff
</dl>
</div>
</div>
}
doc = Nokogiri::HTML.parse(html)
content = doc
.xpath("//div[@id='foo']/div[@id='bar' and @class='baz bang']/dl")
.inner_html
puts content
I think content = @doc.xpath("div#bar")
is a typo and should be content = @doc.css("div#bar")
or better content = @doc.css("#bar")
. The first expression in your second code chunk seems to be ok.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With