Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the content of an XML node using XPath and Nokogiri

I have code like this:

@doc = Nokogiri::HTML(open(url)
@doc.xpath(query).each do |html|

  puts html # how get content of a node
end

How do I get the content of the node instead of something like this:

<li class="stat">
like image 329
John Avatar asked Mar 21 '11 10:03

John


People also ask

What is XPath expression in XML?

XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system. XPath expressions can be used in JavaScript, Java, XML Schema, PHP, Python, C and C++, and lots of other languages.

What is the use of Nokogiri?

Nokogiri (htpp://nokogiri.org/) is the most popular open source Ruby gem for HTML and XML parsing. It parses HTML and XML documents into node sets and allows for searching with CSS3 and XPath selectors. It may also be used to construct new HTML and XML objects.

What does Rails use Nokogiri for?

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).

What is Nokogiri gem?

The Nokogiri gem is an incredible open-source tool that parses HTML and XML data. It is one of the most widely used gems available, and it can really take your Ruby app to another level for data with its ability to help you intuitively scrape websites.


2 Answers

This is the Synopsis example in the README file for Nokogiri showing one way to do it using CSS, XPath or a hybrid:

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we’re interested in...

doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
  puts link.content
end

####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
  puts link.content
end

####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
  puts link.content
end
like image 133
the Tin Man Avatar answered Sep 27 '22 21:09

the Tin Man


See html.content or html.text.

See the Node documentation for more information.

like image 29
Lee Jarvis Avatar answered Sep 27 '22 23:09

Lee Jarvis