Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between PHP's DOM and SimpleXML extensions?

I'm failing to comprehend why do we need 2 XML parsers in PHP.

Can someone explain the difference between those two?

like image 885
Stann Avatar asked Jan 26 '11 09:01

Stann


People also ask

What is SimpleXML extension?

SimpleXML is an extension that allows us to easily manipulate and get XML data. SimpleXML provides an easy way of getting an element's name, attributes and textual content if you know the XML document's structure or layout.

Is PHP like XML?

There is no relation between PHP and XML. XML is something that PHP can consume and produce. There is nowhere during processing that PHP consumes or produces XML unless you explicitly tell PHP to do so.


2 Answers

I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.


DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one object to another.

SimpleXML is intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some values or remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.

DOM offers a full-fledged implementation of the DOM plus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitation in what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complex and more verbose than SimpleXML.


Side-note

People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:

  • if you need to deal with HTML, you don't really have a choice: you have to use DOM
  • if you have to do anything fancy such as moving nodes or appending some raw XML, again you pretty much have to use DOM
  • if all you need to do is read and/or write some basic XML (e.g. exchanging data with an XML service or reading a RSS feed) then you can use either. Or both.
  • if your XML document is so big that it doesn't fit in memory, you can't use either and you have to use XMLReader which is also based on libxml2, is even more annoying to use but still plays nice with others

TL;DR

  • SimpleXML is super easy to use but only good for 90% of use cases.
  • DOM is more complex, but can do everything.
  • XMLReader is super complicated, but uses very little memory. Very situational.
like image 22
Josh Davis Avatar answered Sep 28 '22 10:09

Josh Davis


In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to

  • Best XML Parser for PHP
like image 93
Gordon Avatar answered Sep 28 '22 08:09

Gordon