Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select Nokogiri element after an element with particular attribute

Tags:

ruby

nokogiri

I have been at this for hours and I cannot make any progress. I do not know how to do the following, I am used to arrays and loops, not nokogiri objects.

I want to select the table element immediately after the h2 containing span with id == "filmography"

<h2><span id ="filmography>...
<table>  # What I want to find
  <tr>
    <td>...

So far I have used

objects = page.xpath("//h2" | "//table")

to have an array of nokogiri objects and I test each for id == "Filmography" and would work with the next object, however the elements returned are not in order as they appear on the page they are in the order all h2's then all tables.

Could I somehow have all 'h2's and 'table's as element objects in the order they appear on the page, and test the child object 'span' for its id attribute?

All advice appreciated, as I am thoroughly stuck.

like image 573
scrub_lord Avatar asked Nov 05 '13 23:11

scrub_lord


2 Answers

This looks like it should work:

page.xpath('h2//span[@id="filmography"]').first.next_element
like image 53
Hew Wolff Avatar answered Nov 03 '22 22:11

Hew Wolff


Nokogiri supports CSS selectors, which make this easy:

doc.at('span#filmography table').to_html
=> "<table><tr>\n<td>...</td>\n    </tr></table>"

doc.at('#filmography table').to_html
=> "<table><tr>\n<td>...</td>\n    </tr></table>"

at returns the first matching node, using either a CSS or XPath selector.

The "NodeSet" equivalent is search, which returns a NodeSet, which is like an Array, but would force you to use first after it, which only really makes for a longer command:

doc.search('span#filmography table').first.to_html
doc.search('#filmography table').first.to_html

Because the span tag contains an id parameter, you're safe to use at and only look for #filmography, since IDs are unique in a page.

like image 25
the Tin Man Avatar answered Nov 03 '22 22:11

the Tin Man