As far as I can tell, when you have multiple types of elements at the same level in an XML document tree, PHP's SimpleXML
, including SimpleXMLElement
and SimpleXMLIterator
both don't keep the order of the elements as they relate to each other, only within each element.
For example, consider the following structure:
<catalog>
<book>
<title>Harry Potter and the Chamber of Secrets</title>
<author>J.K. Rowling</author>
</book>
<book>
<title>Great Expectations</title>
<author>Charles Dickens</author>
</book>
</catalog>
If I had this structure and used either SimpleXMLIterator
or SimpleXMLElement
to parse it, I would end up with an array that looked something like this:
Array (
[book] => Array (
[0] => Array (
[title] => Array (
[0] => Harry Potter and the Chamber of Secrets
)
[author] => Array (
[0] => J.K. Rowling
)
)
[1] => Array (
[title] => Array (
[0] => Great Expectations
)
[author] => Array (
[0] => Charles Dickens
)
)
)
)
This would be fine, since I only have book elements, and it keeps the order properly within those elements. However, say I add movie elements, too:
<catalog>
<book>
<title>Harry Potter and the Chamber of Secrets</title>
<author>J.K. Rowling</author>
</book>
<movie>
<title>The Dark Knight</title>
<director>Christopher Nolan</director>
</movie>
<book>
<title>Great Expectations</title>
<author>Charles Dickens</author>
</book>
<movie>
<title>Avatar</title>
<director>Christopher Nolan</director>
</movie>
</catalog>
Parsing with SimpleXMLIterator
or SimpleXMLElement
would result in the following array:
Array (
[book] => Array (
[0] => Array (
[title] => Array (
[0] => Harry Potter and the Chamber of Secrets
)
[author] => Array (
[0] => J.K. Rowling
)
)
[1] => Array (
[title] => Array (
[0] => Great Expectations
)
[author] => Array (
[0] => Charles Dickens
)
)
)
[movie] => Array (
[0] => Array (
[title] => Array (
[0] => The Dark Knight
)
[director] => Array (
[0] => Christopher Nolan
)
)
[1] => Array (
[title] => Array (
[0] => Avatar
)
[director] => Array (
[0] => James Cameron
)
)
)
)
Because it represents the data this way, it seems that I have no way to tell that the order of the books and movies in the XML file was actually book, movie, book, movie
. It just separates them into two categories (although it keeps the order within each category).
Does anyone know of a workaround, or a different XML parser that doesn't have this behavior?
"If I ... used either SimpleXMLIterator or SimpleXMLElement to parse it, I would end up with an array" - no you wouldn't, you would end up with an object, which happens to behave like an array in certain ways.
The output of a recursive dump of that object is not the same as the result of iterating over it.
In particular, running foreach( $some_node->children() as $child_node )
will give you all the children of a node in the order they appear in the document, regardless of name, as shown in this live code demo.
Code:
$xml = <<<EOF
<catalog>
<book>
<title>Harry Potter and the Chamber of Secrets</title>
<author>J.K. Rowling</author>
</book>
<movie>
<title>The Dark Knight</title>
<director>Christopher Nolan</director>
</movie>
<book>
<title>Great Expectations</title>
<author>Charles Dickens</author>
</book>
<movie>
<title>Avatar</title>
<director>Christopher Nolan</director>
</movie>
</catalog>
EOF;
$sx = simplexml_load_string($xml);
foreach ( $sx->children() as $node )
{
echo $node->getName(), '<br />';
}
Output:
book
movie
book
movie
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With