Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I validate XHTML with nokogiri?

I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents.

For me, this:

doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org")))
puts doc.validate

results in a whole heap of:

[
#<Nokogiri::XML::SyntaxError: No declaration for element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,  
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for element head>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute profile of element head
[repeat for every tag in the document.]
]

So I'm assuming that's not the right approach. I can't seem to locate any good examples -- can anyone suggest what I'm doing wrong?

I'm running ruby 1.8.6 on Mac OSX 10.5.8. Nokogiri tells me:

nokogiri: 1.3.3
warnings: []

libxml: 
  compiled: 2.6.23
  loaded: 2.6.23
  binding: extension
like image 334
NeilS Avatar asked Aug 17 '09 13:08

NeilS


1 Answers

It's not just you. What you're doing is supposed to be the right way to do it, but I've never had any luck with it. As far as I can tell, there's some disconnect somewhere between Nokogiri and libxml which causes it to not load SYSTEM DTDs, or to recognize PUBLIC DTDs. It will work if you define the DTD within the XML file, but good luck doing that with the XHTML DTDs.

The best thing I can recommend is to use the schemas for XHTML instead:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('http://www.w3.org'))
xsd = Nokogiri::XML::Schema(open('http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd'))

#this is a true/false validation
xsd.valid?(doc)    # => true

#this gives a listing of errors
xsd.validate(doc)  # => []
like image 51
Pesto Avatar answered Oct 02 '22 13:10

Pesto