I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents.
For me, this:
doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org")))
puts doc.validate
results in a whole heap of:
[
#<Nokogiri::XML::SyntaxError: No declaration for element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for element head>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute profile of element head
[repeat for every tag in the document.]
]
So I'm assuming that's not the right approach. I can't seem to locate any good examples -- can anyone suggest what I'm doing wrong?
I'm running ruby 1.8.6 on Mac OSX 10.5.8. Nokogiri tells me:
nokogiri: 1.3.3
warnings: []
libxml:
compiled: 2.6.23
loaded: 2.6.23
binding: extension
It's not just you. What you're doing is supposed to be the right way to do it, but I've never had any luck with it. As far as I can tell, there's some disconnect somewhere between Nokogiri and libxml which causes it to not load SYSTEM
DTDs, or to recognize PUBLIC
DTDs. It will work if you define the DTD within the XML file, but good luck doing that with the XHTML DTDs.
The best thing I can recommend is to use the schemas for XHTML instead:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::XML(open('http://www.w3.org'))
xsd = Nokogiri::XML::Schema(open('http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd'))
#this is a true/false validation
xsd.valid?(doc) # => true
#this gives a listing of errors
xsd.validate(doc) # => []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With