Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Method to parse HTML document in Ruby?

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

like image 372
Harish Kurup Avatar asked Mar 31 '10 17:03

Harish Kurup


People also ask

How do you parse HTML?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.

What does Nokogiri :: HTML do?

Nokogiri makes an attempt to determine whether a CSS or XPath selector is being passed in. It's possible to create a selector that fools at or search so occasionally it will misunderstand, which is why we have the more specific versions of the methods.

What is Ruby Nokogiri?

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).

Can we parse HTML?

jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.


2 Answers

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

like image 91
Marc-André Lafortune Avatar answered Sep 19 '22 17:09

Marc-André Lafortune


You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

like image 29
Peter Avatar answered Sep 16 '22 17:09

Peter