like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.
If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.
Nokogiri makes an attempt to determine whether a CSS or XPath selector is being passed in. It's possible to create a selector that fools at or search so occasionally it will misunderstand, which is why we have the more specific versions of the methods.
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).
jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.
Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers
You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With