Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I add entity declarations via XML::Twig programmatically?

For the life of me I cannot understand the XML::Twig documentation for entity handling.

I've got some XML I'm generating with HTML::Tidy. The call is as follows:

my $tidy = HTML::Tidy->new({
    'indent'          => 1,
    'break-before-br' => 1,
    'output-xhtml'    => 0,
    'output-xml'      => 1,
    'char-encoding'   => 'raw',
});

$str = "foo   bar";
$xml = $tidy->clean("<xml>$str</xml>");

which produces:

<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo &nbsp; bar</body>
</html>

XML::Twig (understandably) barfs at the &nbsp;. I want to do some transformations, running it through XML::Twig:

my $twig = XML::Twig->new(
  twig_handlers => {... handlers ...}
);

$twig->parse($xml);

The $twig->parse line barfs on the &nbsp;, but I can't figure out how to add the &nbsp; element programmatically. I tried things like:

my $entity = XML::Twig::Entity->new("nbsp", "&#160;");
$twig->entity_list->add($entity);
$twig->parse($xml);

... but no joy.

Please help =)

like image 765
Sir Robert Avatar asked Dec 21 '22 22:12

Sir Robert


2 Answers

A dirty, but efficient, trick in a case like this would be to add a fake DTD declaration.

Then XML::Parser, which does the parsing, will assume that the entity is defined in the DTD and won't barf on it.

To get rid of the fake DTD declaration, you can output the root of the twig. If you need a different declaration, create it and replace the current one:

#!/usr/bin/perl 

use strict;
use warnings;

use XML::Twig;

my $fake_dtd= '<!DOCTYPE head SYSTEM "foo"[]>'; # foo may not even exist

my $xml='<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo &nbsp; bar</body>
</html>';

XML::Twig->new->parse( $fake_dtd . $xml)->root->print;
like image 134
mirod Avatar answered Feb 09 '23 00:02

mirod


use strict;
use XML::Twig;

my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp "&#160;">]>';
my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo &nbsp; bar</body></html>';

my $xTwig = XML::Twig->new();

$xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@";

print $xTwig->sprint();
like image 32
bob.faist Avatar answered Feb 09 '23 00:02

bob.faist