I have been at this for hours and I cannot make any progress. I do not know how to do the following, I am used to arrays and loops, not nokogiri objects.
I want to select the table element immediately after the h2 containing span with id == "filmography"
<h2><span id ="filmography>...
<table> # What I want to find
<tr>
<td>...
So far I have used
objects = page.xpath("//h2" | "//table")
to have an array of nokogiri objects and I test each for id == "Filmography" and would work with the next object, however the elements returned are not in order as they appear on the page they are in the order all h2's then all tables.
Could I somehow have all 'h2's and 'table's as element objects in the order they appear on the page, and test the child object 'span' for its id attribute?
All advice appreciated, as I am thoroughly stuck.
This looks like it should work:
page.xpath('h2//span[@id="filmography"]').first.next_element
Nokogiri supports CSS selectors, which make this easy:
doc.at('span#filmography table').to_html
=> "<table><tr>\n<td>...</td>\n </tr></table>"
doc.at('#filmography table').to_html
=> "<table><tr>\n<td>...</td>\n </tr></table>"
at
returns the first matching node, using either a CSS or XPath selector.
The "NodeSet" equivalent is search
, which returns a NodeSet, which is like an Array, but would force you to use first
after it, which only really makes for a longer command:
doc.search('span#filmography table').first.to_html
doc.search('#filmography table').first.to_html
Because the span
tag contains an id
parameter, you're safe to use at
and only look for #filmography
, since IDs are unique in a page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With