Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing HTML with Nokogiri in Ruby

With this HTML code:

<div class="one">
  .....
</div>
<div class="one">
  .....
</div>
<div class="one">
  .....
</div>
<div class="one">
  .....
</div>

How can I select with Nokogiri the second or third div whose class is one?

like image 942
Ozil Maq Avatar asked Apr 22 '12 19:04

Ozil Maq


1 Answers

You can use Ruby to pare down a large results set to specific items:

page.css('div.one')[1,2]  # Two items starting at index 1 (2nd item)
page.css('div.one')[1..2] # Items with indices between 1 and 2, inclusive

Because Ruby indexing starts at zero you must take care with which items you want.

Alternatively, you can use CSS selectors to find the nth item:

# Second and third items from the set, jQuery-style
page.css('div.one:eq(2),div.one:eq(3)')

# Second and third children, CSS3-style
page.css('div.one:nth-child(2),div.one:nth-child(3)')

Or you can use XPath to get back specific matches:

# Second and third children
page.xpath("//div[@class='one'][position()=2 or position()=3]")

# Second and third items in the result set
page.xpath("(//div[@class='one'])[position()=2 or position()=3]")

With both the CSS and XPath alternatives note that:

  1. Numbering starts at 1, not 0
  2. You can use at_css and at_xpath instead to get back the first-such matching element, instead of a NodeSet.

    # A NodeSet with a single element in it:
    page.css('div.one:eq(2)')
    
    # The second div element
    page.at_css('div.one:eq(2)')
    

Finally, note that if you are selecting a single element by index with XPath, you can use a shorter format:

# First div.one seen that is the second child of its parent
page.at_xpath('//div[@class="one"][2]')

# Second div.one in the entire document
page.at_xpath('(//div[@class="one"])[2]')
like image 162
Phrogz Avatar answered Oct 28 '22 15:10

Phrogz