Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails+Builder.Generate XML output without entities

How to make Builder to not encode 'śćż' and other such characters. What I want is 'całość' to be literally printed in XML document. Example:

xml.instruct! :xml, :version => '1.0', :encoding => 'utf-8'
xml.Trader( :'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
            :'xmlns:xsd' => "http://www.w3.org/2001/XMLSchema") do
  xml.Informacje do
    xml.RodzajPaczki 'całość'
    xml.Program 'mine'
    xml.WersjaProgramu '1.0'
  end
end

Output:

<?xml version="1.0" encoding="utf-8"?> 
<Trader xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
 <Informacje>  
  <RodzajPaczki>ca&#322;o&#347;&#263;</RodzajPaczki> 
    <Program>mine</Program> 
    <WersjaProgramu>1.0</WersjaProgramu> 
  </Informacje>
</Trader> 

ca&#322;o&#347;&#263; should be całość. I saw pseudo solution like xml.RodzajPaczki {|t| t << 'całość' } but it does not work correctly. It outdent 'całość' to left side of a document.

like image 932
Casual Coder Avatar asked Jul 27 '11 14:07

Casual Coder


1 Answers

Here is what is happening. As we know by default Builder will escape non ASCII characters like the ones in całość. You've also mentioned one possible way to kinda fix it and that is:

xml.RodzajPaczki {|t| t << 'całość' }

Unfortunately when you pass a block to the RodzajPaczki element, Builder assumes that there will be some inner xml, so it adds a new line and applies the indent. Of course in our case there is only inner text and no xml so we get some unsightly output like:

<RodzajPaczki>
całość      </RodzajPaczki>

There is an easy way and a harder way to fix this. First the easy way.

Configure Indent To Be Zero

Then you can use the fix from above xml.RodzajPaczki {|t| t << 'całość' } everything will work as expected, but the output will not be pretty printed, it will infact be all on one line:

<?xml version="1.0" encoding="UTF-8"?><Trader xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><Informacje><RodzajPaczki>całość</RodzajPaczki><Program>mine</Program><WersjaProgramu>1.0</WersjaProgramu></Informacje></Trader>

You can run this through an external pretty printer if you want it nicely formatted.

If you simply must have pretty printed output and want no escaping, we need to patch Builder slightly. This is the harder way to fix this issue.

Patching Builder

We need to patch the initializer of our XmlMarkup object to add an extra option :escape. At the same time we patch the XmlBase object to take this new option as a parameter. We default this new option to true, to maintain the default behaviour. We then patch the text! method on XmlBase to use our new option to decide if we should escape text of not. Here is what it looks like:

module Builder
  class XmlBase
    def initialize(indent=0, initial=0, encoding='utf-8', escape=true)
      @indent = indent
      @level  = initial
      @encoding = encoding.downcase
      @escape = escape
    end

    def text!(text)
      if @escape
        _text(_escape(text))
      else
        _text(text)
      end
    end
  end

  class XmlMarkup
    def initialize(options={})
      indent = options[:indent] || 0
      margin = options[:margin] || 0
      encoding = options[:encoding] || 'utf-8'
      escape = options[:escape]
      if escape == nil
        escape = true
      end
      super(indent, margin, encoding, escape)
      @target = options[:target] || ""
    end
  end
end

We can now use our newly patched builder in the following way (notice that when we construct the XmlMarkup object we pass in our new :escape options with a value of false):

xml = Builder::XmlMarkup.new(:target=>STDOUT, :indent=>3, :encoding => 'utf-8', :escape => false)
xml.instruct! :xml, :version => '1.0', :encoding => 'UTF-8'
xml.Trader(:'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance", :'xmlns:xsd' => "http://www.w3.org/2001/XMLSchema") do 
  xml.Informacje do
    xml.RodzajPaczki('całość')
    xml.Program('mine')
    xml.WersjaProgramu('1.0')
  end
end

The output is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Trader xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <Informacje>
      <RodzajPaczki>całość</RodzajPaczki>
      <Program>mine</Program>
      <WersjaProgramu>1.0</WersjaProgramu>
   </Informacje>
</Trader>

As desired the text is not escaped. Note that the patch will apply this non-escaping behaviour to all text, so if you only want some of the text to be non-escaped while other text is still escaped you'll need to patch Builder to a much greater extent.

like image 135
skorks Avatar answered Nov 14 '22 03:11

skorks