Given:
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<div class='block' id='X1'>
<h1>Foo</h1>
<p id='para-2'>B</p>
</div>
<p id='para-3'>C</p>
<h2>Bar</h2>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<div class='block' id='X2'>
<p id='para-6'>F</p>
</div>
</body>
</html>"
HTML_END
I want to do something like what I can do in Hpricot:
divs = value.search('//div[@id^="para-"]')
Use the xpath function starts-with
:
value.xpath('//p[starts-with(@id, "para-")]').each { |x| puts x['id'] }
divs = value.css('div[id^="para-"]')
And some docs you're seeking:
Nokogiri::XML::Node.send(:define_method, 'xpath_regex') { |*args|
xpath = args[0]
rgxp = /\/([a-z]+)\[@([a-z\-]+)~=\/(.*?)\/\]/
xpath.gsub!(rgxp) { |s| m = s.match(rgxp); "/#{m[1]}[regex(.,'#{m[2]}','#{m[3]}')]" }
self.xpath(xpath, Class.new {
def regex node_set, attr, regex
node_set.find_all { |node| node[attr] =~ /#{regex}/ }
end
}.new)
}
Usage:
divs = Nokogiri::HTML(page.root.to_html).
xpath_regex("//div[@class~=/axtarget$/]//div[@class~=/^carbo/]")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With