Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SimpleXML get Element Content between Child Elements

Tags:

php

xml

simplexml

I am parsing XML in PHP with SimpleXML and have an XML like this:

<xml>
    <element>
        textpart1
            <subelement>subcontent1</subelement>
        textpart2
            <subelement>subcontent2</subelement>
        textpart3
    </element>
</xml>

When I do $xml->element it naturally gives me the whole element, as in all three textparts.

So if I parse this into an array (with a foreach for the children) I get:

0 => textpart1textpart2textpart3, 1 => subcontent1, 2 => subcontent2

I need a way to parse the <element> node so that each textpart that stops at, or begins after a subelement is treated as its own element.

As a result I am looking for an ordered list that could be express in an array like this:

0 => textpart1, 1 => subcontent1, 2 => textpart2, 3 => subcontent2, 4 => textpart3

Is that possible without altering the XML file? Thanks in advance for any hints!

like image 241
Sebastian Avatar asked Nov 24 '25 13:11

Sebastian


1 Answers

As others have said, SimpleXML doesn't have any support for accessing individual text nodes as separate entities, so you will need to supplement it with some DOM methods. Thankfully, you can switch between the two at will using dom_import_simplexml and simplexml_import_dom.

The key pieces of DOM functionality you need are:

  • the DOMElement->childNodes member variable for accessing all nodes directly under a particular element as an iterable list
  • the DOMNode->nodeType variable for determining if a particular child is a text node or an element
  • the DOMNode->nodeValue variable to get the actual text

Given those, you can write a function which returns an array with a mixture of SimpleXML objects for child elements, and strings for child text nodes, something like this:

function get_child_elements_and_text_nodes($sx_element)
{
    $return = array();

    $dom_element = dom_import_simplexml($sx_element);
    foreach ( $dom_element->childNodes as $dom_child )
    {
        switch ( $dom_child->nodeType )
        {
            case XML_TEXT_NODE:
                $return[] = $dom_child->nodeValue;
            break;
            case XML_ELEMENT_NODE:
                $return[] = simplexml_import_dom($dom_child);
            break;
        }
    }

    return $return;
}

In your case, you need to recurse down the tree, which makes it a little confusing if you mix DOM and SimpleXML as you go, so you could instead write the recursion entirely in DOM and convert the SimpleXML object before running it:

function recursively_find_text_nodes($dom_element)
{
    $return = array();

    foreach ( $dom_element->childNodes as $dom_child )
    {
        switch ( $dom_child->nodeType )
        {
            case XML_TEXT_NODE:
                $return[] = $dom_child->nodeValue;
            break;
            case XML_ELEMENT_NODE:
                $return = array_merge($return, recursively_find_text_nodes($dom_child));
            break;
        }
    }

    return $return;
}

$text_nodes = recursively_find_text_nodes(dom_import_simplexml($simplexml->element));

Here's a live demo of that last function.

like image 81
IMSoP Avatar answered Nov 26 '25 04:11

IMSoP



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!