Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print an XML document without the XML header line at the top

Tags:

xml

ruby

nokogiri

I am just trying to find out how to to a to_xml with a Nokogiri::XML::Document or a Nokogiri::XML::DocumentFragment.

Alternatively, I would like to use xPath on a Nokogiri::XML::DocumentFragment. I was unable to ascertain how to do that, however I am successfully parsing a Nokogiri::XML::Document.

I am later including a parsed and modified DocumentFragment into another piece of XML, but I'm really getting bitten on what I thought would be some really simple things.

Like trying to do a to_xml on a doc or docfrag, and NOT INCLUDING that xml line at the top. Why so hard?

like image 340
AKWF Avatar asked Nov 21 '11 21:11

AKWF


People also ask

How do I print an XML file?

Browse for the XML file by clicking File->Open or pressing Ctrl+O. Click File->Print or press Ctrl+P to open the Printer window.

Is XML header required?

The XML declaration is mandatory if the encoding of the document is anything other than UTF-8 or UTF-16. In practice, this means that documents encoded using US-ASCII can also omit the XML declaration because US-ASCII overlaps entirely with UTF-8. Only one encoding can be used for an entire XML document.

What is the first line in XML document?

The first line of an XML document should be a declaration that this is an XML document, including the version of XML being used. <? xml version="1.0"?> It is also useful to include a statement of the encoding used in the file.

What is the XML header?

The XML header specifies the XML version number, and optionally the character encodings, as part of a grammar document's XML declaration on the first line of the document.


1 Answers

The simplest way to get the XML for a Document without the leading "PI" (processing instruction) is to call to_s on the root element instead of the document itself:

require 'nokogiri'
doc = Nokogiri.XML('<hello world="true" />')

puts doc
#=> <?xml version="1.0"?>
#=> <hello world="true"/>

puts doc.root
#=> <hello world="true"/>

The 'correct' way to do it at the document or builder level, though, is to use SaveOptions:

formatted_no_decl = Nokogiri::XML::Node::SaveOptions::FORMAT +
                    Nokogiri::XML::Node::SaveOptions::NO_DECLARATION

puts doc.to_xml( save_with:formatted_no_decl )
#=> <hello world="true"/>

# Making your code shorter, but horribly confusing for future readers
puts doc.to_xml save_with:3
#=> <hello world="true"/>

 


Note that DocumentFragments do not automatically include this PI:

frag = Nokogiri::XML::DocumentFragment.parse('<hello world="true" />')
puts frag
#=> <hello world="true"/>

If you are seeing a PI in your fragment output, it means it was there when you parsed it.

xml = '<?xml version="1.0"?><hello world="true" />'
frag = Nokogiri::XML::DocumentFragment.parse(xml)
puts frag
#=> <?xml version="1.0"?><hello world="true"/>

If so, and you want to get rid of any PIs, you can do so should be able to do so with a little XPath:

frag.xpath('//processing-instruction()').remove
puts frag

…except that this does not appear to work due to oddness with XPath in DocumentFragments. To work around these bugs do this instead:

# To remove only PIs at the root level of the fragment
frag.xpath('processing-instruction()').remove
puts frag
#=> <hello world="true"/>

# Alternatively, to remove all PIs everywhere, including inside child nodes
frag.xpath('processing-instruction()|.//processing-instruction()').remove

 


If you have a Builder object, do either of:

builder = Nokogiri::XML::Builder.new{ |xml| xml.hello(world:"true") }

puts builder.to_xml
#=> <?xml version="1.0"?>
#=> <hello world="true"/>

puts builder.doc.root.to_xml
#=> <hello world="true"/>

formatted_no_decl = Nokogiri::XML::Node::SaveOptions::FORMAT +
                    Nokogiri::XML::Node::SaveOptions::NO_DECLARATION

puts builder.to_xml save_with:formatted_no_decl
#=> <hello world="true"/>
like image 55
Phrogz Avatar answered Sep 29 '22 11:09

Phrogz