Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debug a DOMDocument Object in PHP

I'm trying to debug a large and complex DOMDocument object in php. Ideally it'd be nice if I could get DOMDocument to output in a array-like format.

DoMDocument:

$dom = new DOMDocument();
$dom->loadHTML("<html><body><p>Hello World</p></body></html>");
var_dump($dom); //or something equivalent

This outputs

DOMDocument Object ( ) 

whereas I'd like it to output

DOMDocument:
html
=>body
==>p
===>Hello World

Or something like that. Why is there no handy debug or output for this?!?

like image 570
Ray Avatar asked Mar 26 '09 01:03

Ray


3 Answers

This answer is a little late probably, but I liked your question!

PHP has nothing build-in directly to solve your problem, so there is not XML dump or something.

However, PHP has the RecursiveTreeIterator­Docs that comes pretty close to your output:

\-<html>
  \-<body>
    \-<p>
      \-Hello World

(it will look better if your X(HT)ML structure looks more complicated.)

It's used quite simple (as most iterators) with a foreach:

$tree = new RecursiveTreeIterator($iterator);
foreach($tree as $key => $value)
{
    echo $value . "\n";
}

(You can wrap this inside a function, so you only need to call the function)

Even this looks simple, there's one caveat: it needs a RecursiveIterator over the DOMDocument tree. As PHP can not guess what you need, it needs to be wrapped into code. As written, I found the question interesting (and obviously you have not asked for XML output), so I wrote some little code that offers the recursive iterator needed. So here we go.

First of all you might not be familiar with iterators in PHP. That's no deal to make use of the code I'll show as I'll do it backwards, however, whenever you consider to run some code on your own, consider whether or not you can make use of the iterator capabilities PHP has to offer. I write that because it helps to solve common problems and to make components that are not really related with each other to work with each other. For example, the RecursiveTreeIterator­Docs is built-in, and it will work with anything you feed it with (and you can even configure it). However it needs a RecursiveIterator to operate upon.

So let's give it a RecursiveIterator that offers <tag> for DOMNodes that are tags (elements) and just the text if they are textnodes:

class DOMRecursiveDecoratorStringAsCurrent extends RecursiveIteratorDecoratorStub
{
    public function current()
    {
        $node = parent::current();
        $nodeType = $node->nodeType;

        switch($nodeType)
        {
            case XML_ELEMENT_NODE:
                return "<$node->tagName>";

            case XML_TEXT_NODE:
                return $node->nodeValue;

            default:
                return sprintf('(%d) %s', $nodeType, $node->nodeValue);
        }
    }
}

This DOMRecursiveDecoratorStringAsCurrent class (the name is exemplary only) makes use of some abstract code in RecursiveIteratorDecoratorStub. The important part however is the ::current function which just returns the tagName of a DOMNode in bracketsWikipedia (<>) and the text of textnodes as-is. That's what your output needs, so that's everything needed to code.

Actually this does not work until you have the abstract code as well, but to visualize the code how it's used (the most interesting part), let's view it:

$iterator = new DOMRecursiveDecoratorStringAsCurrent($iterator);
$tree = new RecursiveTreeIterator($iterator);
foreach($tree as $key => $value)
{
    echo $value . "\n";
}

As it's done backwards, for the moment we have the output specified based on which DOMNode is to be displayed by the RecursiveTreeIterator. Fine so far, easy to get. But the missing meat it is inside the abstract code and how to create a RecursiveIterator over all nodes inside a DOMElement. Just preview the whole code how it is invoked (as written before, you can put this into a function to make it easily accessible within your code for debugging purposes. Probably a function called xmltree_dump):

$dom = new DOMDocument();
$dom->loadHTML("<html><body><p>Hello World</p></body></html>");
$iterator = new DOMRecursiveIterator($dom->documentElement);
$iterator = new DOMRecursiveDecoratorStringAsCurrent($iterator);
$tree = new RecursiveTreeIterator($iterator);
foreach($tree as $key => $value)
{
    echo $value . "\n";
}

So what do we got here in addition to the code already covered? First there is a DOMRecursiveIterator - and that's it. The rest of the code is standard DOMDocument code.

So let's write about DOMRecursiveIterator. It's the needed RecursiveIterator that's finally needed within the RecursiveTreeIterator. It get's decorated so that the dump of the tree actually prints tagnames in brackets and text as-is.

Probably it's worth to share the code of it now:

class DOMRecursiveIterator extends DOMIterator implements RecursiveIterator
{
    public function hasChildren()
    {
        return $this->current()->hasChildNodes();
    }
    public function getChildren()
    {
        $children = $this->current()->childNodes;
        return new self($children);
    }
}

It's a pretty short class with only two functions. I'm cheating here as this class also extends from another class. But as written, this is backwards, so this class actually takes care of the recursion: hasChildren and getChildren. Obviously even those two functions don't have much code, they are just mapping the "question" (hasChildren? getChildren?) onto a standard DOMNode. If a node has children, well, say yes or just return them (and this is an iterator, return them in form of an iterator, hence the new self()).

So as this is pretty short, after choking it, just continue with the parent class DOMIterator (the implements RecursiveIterator­Docs is just to make it working):

class DOMIterator extends IteratorDecoratorStub
{
    public function __construct($nodeOrNodes)
    {
        if ($nodeOrNodes instanceof DOMNode)
        {
            $nodeOrNodes = array($nodeOrNodes);
        }
        elseif ($nodeOrNodes instanceof DOMNodeList)
        {
            $nodeOrNodes = new IteratorIterator($nodeOrNodes);
        }
        if (is_array($nodeOrNodes))
        {
            $nodeOrNodes = new ArrayIterator($nodeOrNodes);
        }

        if (! $nodeOrNodes instanceof Iterator)
        {
            throw new InvalidArgumentException('Not an array, DOMNode or DOMNodeList given.');
        }

        parent::__construct($nodeOrNodes);
    }
}

This is the base iterator for DOMPHP, it just takes a DOMNode or a DOMNodeList to iterate over. This sounds a bit superfluous maybe, as DOM supports this sort-of with DOMNodeList already, but it does not support a RecursiveIterator and we already know that we need one for RecursiveTreeIterator for the output. So in it's constructor an Iterator is created and passed on to the parent class, which again is abstract code. Sure I'll reveal this code in just a minute. As this is backwards, let's review what's been done so far:

  • RecursiveTreeIterator for the tree-like output.
  • DOMRecursiveDecoratorStringAsCurrent for the visualization of a DOMNode in the tree
  • DOMRecursiveIterator and DOMIterator to iterate recursively over all nodes in a DOMDocument.

This in terms of definition as all that's needed, however the code that I called abstract is still missing. It's just some sort of simple proxy code, it delegates the same method down to another object. A related pattern is called Decorator. However, this is just the code, first the Iterator and then it's RecursiveIterator friend:

abstract class IteratorDecoratorStub implements OuterIterator
{
    private $iterator;
    public function __construct(Iterator $iterator)
    {
        $this->iterator = $iterator;
    }
    public function getInnerIterator()
    {
        return $this->iterator;
    }
    public function rewind()
    {
        $this->iterator->rewind();
    }
    public function valid()
    {
        return $this->iterator->valid();
    }
    public function current()
    {
        return $this->iterator->current();
    }
    public function key()
    {
        return $this->iterator->key();
    }
    public function next()
    {
        $this->iterator->next(); 
    }
}

abstract class RecursiveIteratorDecoratorStub extends IteratorDecoratorStub implements RecursiveIterator
{
    public function __construct(RecursiveIterator $iterator)
    {
        parent::__construct($iterator);
    }
    public function hasChildren()
    {
        return $this->getInnerIterator()->hasChildren();
    }
public function getChildren()
{
    return new static($this->getInnerIterator()->getChildren());
}
}

That's nothing very magically, it's just well delegating the method calls to it's inherited object $iterator. It looks like repeating and well iterators are about repetition. I put this into abstract classes so I only need to write this very simple code once. So at least I myself don't need to repeat myself.

These two abstract classes are used by other classes which have been already discussed earlier. Because they are so simple, I left it until here.

Well, much to read until here but the good part is, that's it.

In short: PHP does not have this build in, but you can write this on your own quite simple and re-useable. As written earlier, it's a good idea to wrap this into a function called xmltree_dump so it can be easily called for debugging purposes:

function xmltree_dump(DOMNode $node)
{
    $iterator = new DOMRecursiveIterator($node);
    $decorated = new DOMRecursiveDecoratorStringAsCurrent($iterator);
    $tree = new RecursiveTreeIterator($decorated);
    foreach($tree as $key => $value)
    {
        echo $value . "\n";
    }
}

Usage:

$dom = new DOMDocument();
$dom->loadHTML("<html><body><p>Hello World</p></body></html>");
xmltree_dump($dom->documentElement);

the only thing needed is to have all the class definitions used included / required. You can put them in one file and use require_once or integrate them with an autoloader that you're probably using. Full code at once.

If you need to edit the way of output, you can edit DOMRecursiveDecoratorStringAsCurrent or change the configuration of RecursiveTreeIterator­ inside xmltree_dump. Hope this is helpful (even quite lengthy, backwards is pretty in-direct).

like image 141
hakre Avatar answered Sep 21 '22 13:09

hakre


http://usphp.com/manual/en/function.dom-domdocument-savexml.php

$dom->formatOutput = true;
echo $dom->saveXML();
like image 30
Phill Pafford Avatar answered Sep 23 '22 13:09

Phill Pafford


for a dom node, just use the following:

print_r(simplexml_import_dom($entry)->asXML());
like image 25
kayue Avatar answered Sep 23 '22 13:09

kayue