Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to import XML string in a php DOMDocument

For exemple, i create a DOMDocument like that :

<?php

$implementation = new DOMImplementation();

$dtd =
  $implementation->createDocumentType
  (
    'html',                                     // qualifiedName
    '-//W3C//DTD XHTML 1.0 Transitional//EN',   // publicId
    'http://www.w3.org/TR/xhtml1/DTD/xhtml1-'
      .'transitional.dtd'                       // systemId
  );

$document = $implementation->createDocument('', '', $dtd);

$elementHtml     = $document->createElement('html');
$elementHead     = $document->createElement('head');
$elementBody     = $document->createElement('body');
$elementTitle    = $document->createElement('title');
$textTitre       = $document->createTextNode('My bweb page');
$attrLang        = $document->createAttribute('lang');
$attrLang->value = 'en';

$document->appendChild($elementHtml);
$elementHtml->appendChild($elementHead);
$elementHtml->appendChild($attrLang);
$elementHead->appendChild($elementTitle);
$elementTitle->appendChild($textTitre);
$elementHtml->appendChild($elementBody);

So, now, if i have some xhtml string like that :

<?php
$xhtml = '<h1>Hello</h1><p>World</p>';

How can i import it in the <body> node of my DOMDocument ?

For now, the only solution I've found, is something like that :

<?php
$simpleXmlElement = new SimpleXMLElement($xhtml);

$domElement = dom_import_simplexml($simpleXmlElement);

$domElement = $document->importNode($domElement, true);

$elementBody->appendChild($domElement);

This solution seems very bad for me, and create some problemes, like when I try with a string like that :

<?php
$xhtml = '<p>Hello&nbsp;World</p>';

Ok, I can bypass this problem by converting xhtml entities in Unicode entities, but it's so ugly...

Any help ?

Thanks by advance !

Related question :

  • DOMDocument::validate() problem (solved)
like image 355
Pascal Qyy Avatar asked Nov 02 '10 18:11

Pascal Qyy


People also ask

What is DomDocument() in PHP?

The DOMDocument::getElementsByTagName() function is an inbuilt function in PHP which is used to return a new instance of class DOMNodeList which contains all the elements of local tag name.

What is Domdoc?

A DomDocument is a container (variable/object) for holding an XML document in your VBA code. Just as you use a String variable to hold a strings value, you can use a DomDocument to hold an XML document. (for a complete list of a DomDocuments properties, see halfway down this page)


2 Answers

The problem is DOM does not know that it should consider the XHTML DTD unless you validated the document against it. Unless you do that, DOM doesnt know any entities defined in the DTD, nor any other rules in it. Fortunately, we sorted out how to do the validation in that other question, so armed with that knowledge you can do

$document->validate(); // anywhere before importing the other DOM

And then import with

$fragment = $document->createDocumentFragment();
$fragment->appendXML('<h1>Hello</h1><p>Hello&nbsp;World</p>');
$document->getElementsByTagName('body')->item(0)->appendChild($fragment);
$document->formatOutput = TRUE;
echo $document->saveXml();

outputs:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>My bweb page</title>
  </head>
  <body>
    <h1>Hello</h1>
    <p>Hello&nbsp;World</p>
  </body>
</html>

The other way to import XML into another DOM is to use

$one = new DOMDocument;
$two = new DOMDocument;
$one->loadXml('<root><foo>one</foo></root>');
$two->loadXml('<root><bar><sub>two</sub></bar></root>');
$bar = $two->documentElement->firstChild; // we want to import the bar tree
$one->documentElement->appendChild($one->importNode($bar, TRUE));
echo $one->saveXml();

outputs:

<?xml version="1.0"?>
<root><foo>one</foo><bar><sub>two</sub></bar></root>

However, this cannot work with

<h1>Hello</h1><p>Hello&nbsp;World</p>

because when you load a document into DOM, DOM will overwrite everything you told it before about the document. Thus, when using load, libxml (and thus SimpleXml, DOM and XMLReader) does (do) not know you mean XHTML. And it does not know any entities defined in it and will fuzz about them instead. But even if the string would not contain the entity, it is not valid XML, because it lacks a root node. That's why you use the fragment.

like image 173
Gordon Avatar answered Sep 20 '22 23:09

Gordon


You can use a DomDocumentFragment for this:

$fragment = $document->createDocumentFragment();
$fragment->appendXml($xhtml);
$elementBody->appendChild($fragment);

That's all there is to it...

Edit: Well, if you must have xhtml (instead of valid xml), you could do this dirty workaround:

function xhtmlToDomNode($xhtml) {
    $dom = new DomDocument();
    $dom->loadHtml('<html><body>'.$xhtml.'</body></html>');
    $fragment = $dom->createDocumentFragment();
    $body = $dom->getElementByTagName('body')->item(0);
    foreach ($body->childNodes as $child) {
        $fragment->appendChild($child);
    }
    return $fragment;
}

usage:

$fragment = xhtmlToDomNode($xhtml);
$document->importNode($fragment, true);
$elementBody->appendChild($fragment);
like image 20
ircmaxell Avatar answered Sep 17 '22 23:09

ircmaxell