Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get the next HTML element in Nokogiri?

Tags:

ruby

nokogiri

Let's say my HTML document is like:

<div class="headline">News</div>
<p>Some interesting news here</p>
<div class="headline">Sports</div>
<p>Baseball is fun!</p>

I can get the headline divs with the following code:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "mypage.html"
doc = Nokogiri::HTML(open(url))

doc.css(".headline").each do |item|  
  puts item.text
end 

But how do I access the content in the following p tag so that News is related to Some interesting news here, etc?

like image 979
cbmeeks Avatar asked Mar 22 '11 15:03

cbmeeks


People also ask

How does Nokogiri work?

Nokogiri makes an attempt to determine whether a CSS or XPath selector is being passed in. It's possible to create a selector that fools at or search so occasionally it will misunderstand, which is why we have the more specific versions of the methods.

What does Rails use Nokogiri for?

One of the best gems for Ruby on Rails is Nokogiri which is a library to deal with XML and HTML documents. The most common use for a parser like Nokogiri is to extract data from structured documents.

What is Nokogiri Ruby?

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).


1 Answers

You want Node#next_element:

doc.css(".headline").each do |item|
  puts item.text
  puts item.next_element.text
end

There is also item.next, but that will also return text nodes, where item.next_element will return only element nodes (like p).

like image 77
Nathan Ostgard Avatar answered Oct 07 '22 18:10

Nathan Ostgard