I have HTML code like this:
<div id="first">
<dt>Label1</dt>
<dd>Value1</dd>
<dt>Label2</dt>
<dd>Value2</dd>
...
</div>
My code does not work.
doc.css("first").each do |item|
label = item.css("dt")
value = item.css("dd")
end
Show all the <dt>
tags firsts and then the <dd>
tags and I need "label: value"
First of all, your HTML should have the <dt>
and <dd>
elements inside a <dl>
:
<div id="first">
<dl>
<dt>Label1</dt>
<dd>Value1</dd>
<dt>Label2</dt>
<dd>Value2</dd>
...
</dl>
</div>
but that won't change how you parse it. You want to find the <dt>
s and iterate over them, then at each <dt>
you can use next_element
to get the <dd>
; something like this:
doc = Nokogiri::HTML('<div id="first"><dl>...')
doc.css('#first').search('dt').each do |node|
puts "#{node.text}: #{node.next_element.text}"
end
That should work as long as the structure matches your example.
Under the assumption that some <dt>
may have multiple <dd>
, you want to find all <dt>
and then (for each) find the following <dd>
before the next <dt>
. This is pretty easy to do in pure Ruby, but more fun to do in just XPath. ;)
Given this setup:
require 'nokogiri'
html = '<dl id="first">
<dt>Label1</dt><dd>Value1</dd>
<dt>Label2</dt><dd>Value2</dd>
<dt>Label3</dt><dd>Value3a</dd><dd>Value3b</dd>
<dt>Label4</dt><dd>Value4</dd>
</dl>'
doc = Nokogiri.HTML(html)
Using no XPath:
doc.css('dt').each do |dt|
dds = []
n = dt.next_element
begin
dds << n
n = n.next_element
end while n && n.name=='dd'
p [dt.text,dds.map(&:text)]
end
#=> ["Label1", ["Value1"]]
#=> ["Label2", ["Value2"]]
#=> ["Label3", ["Value3a", "Value3b"]]
#=> ["Label4", ["Value4"]]
Using a Little XPath:
doc.css('dt').each do |dt|
dds = dt.xpath('following-sibling::*').chunk{ |n| n.name }.first.last
p [dt.text,dds.map(&:text)]
end
#=> ["Label1", ["Value1"]]
#=> ["Label2", ["Value2"]]
#=> ["Label3", ["Value3a", "Value3b"]]
#=> ["Label4", ["Value4"]]
Using Lotsa XPath:
doc.css('dt').each do |dt|
ct = dt.xpath('count(following-sibling::dt)')
dds = dt.xpath("following-sibling::dd[count(following-sibling::dt)=#{ct}]")
p [dt.text,dds.map(&:text)]
end
#=> ["Label1", ["Value1"]]
#=> ["Label2", ["Value2"]]
#=> ["Label3", ["Value3a", "Value3b"]]
#=> ["Label4", ["Value4"]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With