Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xml to json with attributes for php or python

I'm trying to convert some XML to JSON, which is easy enough with PHP

$file = file_get_contents('data.xml' );
$a = json_decode(json_encode((array) simplexml_load_string($file)),1);
print_r($a);

Taking the following XML

<?xml version="1.0" encoding="UTF-8"?>
<foo>
    <bar>
        <one lang="fr" type="bar">Test</one>
        <one lang="fr" type="foo">Test</one>
        <one lang="fr" type="baz">Test</one>
    </bar>

    <thunk>
        <thud>
            <bar lang="fr" name="bob">test</bar>
            <bar lang="bz" name="frank">test</bar>
            <bar lang="ar" name="alive">test</bar>
            <bar lang="fr" name="bob">test</bar>
        </thud>
    </thunk>

</foo>

And paring it through simplexml produces

Array
(
    [bar] => Array
        (
            [one] => Array
                (
                    [0] => Test
                    [1] => Test
                    [2] => Test
                )

        )

    [thunk] => Array
        (
            [thud] => Array
                (
                    [bar] => Array
                        (
                            [0] => test
                            [1] => test
                            [2] => test
                            [3] => test
                        )

                )

        )

)

Where ideally the output would look like this

{
    "foo": {
        "bar": {
            "one": [
                {
                    "_lang": "fr",
                    "_type": "bar",
                    "__text": "Test"
                },
                {
                    "_lang": "fr",
                    "_type": "foo",
                    "__text": "Test"
                },
                {
                    "_lang": "fr",
                    "_type": "baz",
                    "__text": "Test"
                }
            ]
        },
        "thunk": {
            "thud": {
                "bar": [
                    {
                        "_lang": "fr",
                        "_name": "bob",
                        "__text": "test"
                    },
                    {
                        "_lang": "bz",
                        "_name": "frank",
                        "__text": "test"
                    },
                    {
                        "_lang": "ar",
                        "_name": "alive",
                        "__text": "test"
                    },
                    {
                        "_lang": "fr",
                        "_name": "bob",
                        "__text": "test"
                    }
                ]
            }
        }
    }
}

Trouble is that the output doesn't contain all the attributes for the child elements, some of these elements contain two or more attributes, is there a way to transform the xml with PHP or Python and include all the attributes found in all the children?

Thanks

like image 743
user2988129 Avatar asked Jul 07 '15 15:07

user2988129


People also ask

Can you convert XML to JSON in python?

Method 1: Using xmltodict and json module STEP 4: Convert the xml_data into a dictionary and store it in a variable JSON object are surrounded by curly braces { }. They are written in key and value pairs. json. loads() takes in a string and returns a json object.

Can you convert XML to JSON?

To convert an XML document to JSON, follow these steps: Select the XML to JSON action from the Tools > JSON Tools menu. Choose or enter the Input URL of the XML document. Choose the path of the Output file that will contain the resulting JSON document.

Can we convert XML to JSON in JavaScript?

To convert XML text to JavaScript object, use xml2js() . To convert XML text to JSON text, use xml2json() .


1 Answers

In my answer I'll cover PHP, specifically SimpleXMLElement which is already part of your code.

The basic way to JSON encode XML with SimpleXMLElement is similar to what you have in your question. You instantiate the XML object and then you json_encode it (Demo):

$xml = new SimpleXMLElement($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);

This produces an output close but not exactly like what you're looking for already. So what you do here with simplexml is that you change the standard way how json_encode will encode the XML object.

This can be done with a new subtype of SimpleXMLElement implementing the JsonSerializable interface. Here is such a class that has the default way how PHP would JSON-serialize the object:

class JsonSerializer extends SimpleXmlElement implements JsonSerializable
{
    /**
     * SimpleXMLElement JSON serialization
     *
     * @return null|string
     *
     * @link http://php.net/JsonSerializable.jsonSerialize
     * @see JsonSerializable::jsonSerialize
     */
    function jsonSerialize()
    {
        return (array) $this;
    }
}

Using it will produce the exact same output (Demo):

$xml = new JsonSerializer($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);

So now comes the interesting part to change the serialization just these bits to get your output.

First of all you need to differ between whether it's an element carrying other elements (has children) or it is a leaf-element of which you want the attributes and the text value:

    if (count($this)) {
        // serialize children if there are children
        ...
    } else {
        // serialize attributes and text for a leaf-elements
        foreach ($this->attributes() as $name => $value) {
            $array["_$name"] = (string) $value;
        }
        $array["__text"] = (string) $this;
    }

That's done with this if/else. The if-block is for the children and the else-block for the leaf-elements. As the leaf-elements are easier, I've kept them in the example above. As you can see in the else-block it iterates over all attributes and adds those by their name prefixed with "_" and finally the "__text" entry by casting to string.

The handling of the children is a bit more convoluted as you need to differ between a single child element with it's name only or multiple children with the same name which require an additional array inside:

        // serialize children if there are children
        foreach ($this as $tag => $child) {
            // child is a single-named element -or- child are multiple elements with the same name - needs array
            if (count($child) > 1) {
                $child = [$child->children()->getName() => iterator_to_array($child, false)];
            }
            $array[$tag] = $child;
        }

Now there is another special case the serialization needs to deal with. You encode the root element name. So this routine needs to check for that condition (being the so called document-element) (compare with SimpleXML Type Cheatsheet) and serialize to that name under occasion:

    if ($this->xpath('/*') == array($this)) {
        // the root element needs to be named
        $array = [$this->getName() => $array];
    }

Finally all left to be done is to return the array:

    return $array;

Compiled together this is a JsonSerializer done in simplexml tailored to your needs. Here the class and it's invocation at once:

class JsonSerializer extends SimpleXmlElement implements JsonSerializable
{
    /**
     * SimpleXMLElement JSON serialization
     *
     * @return null|string
     *
     * @link http://php.net/JsonSerializable.jsonSerialize
     * @see JsonSerializable::jsonSerialize
     */
    function jsonSerialize()
    {
        if (count($this)) {
            // serialize children if there are children
            foreach ($this as $tag => $child) {
                // child is a single-named element -or- child are multiple elements with the same name - needs array
                if (count($child) > 1) {
                    $child = [$child->children()->getName() => iterator_to_array($child, false)];
                }
                $array[$tag] = $child;
            }
        } else {
            // serialize attributes and text for a leaf-elements
            foreach ($this->attributes() as $name => $value) {
                $array["_$name"] = (string) $value;
            }
            $array["__text"] = (string) $this;
        }

        if ($this->xpath('/*') == array($this)) {
            // the root element needs to be named
            $array = [$this->getName() => $array];
        }

        return $array;
    }
}

$xml = new JsonSerializer($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);

Output (Demo):

{
    "foo": {
        "bar": {
            "one": [
                {
                    "_lang": "fr",
                    "_type": "bar",
                    "__text": "Test"
                },
                {
                    "_lang": "fr",
                    "_type": "foo",
                    "__text": "Test"
                },
                {
                    "_lang": "fr",
                    "_type": "baz",
                    "__text": "Test"
                }
            ]
        },
        "thunk": {
            "thud": {
                "bar": [
                    {
                        "_lang": "fr",
                        "_name": "bob",
                        "__text": "test"
                    },
                    {
                        "_lang": "bz",
                        "_name": "frank",
                        "__text": "test"
                    },
                    {
                        "_lang": "ar",
                        "_name": "alive",
                        "__text": "test"
                    },
                    {
                        "_lang": "fr",
                        "_name": "bob",
                        "__text": "test"
                    }
                ]
            }
        }
    }
}

I hope this was helpful. It's perhaps a little much at once, you find the JsonSerializable interface documented in the PHP manual as well, you can find more example there. Another example here on Stackoverflow with this kind of XML to JSON conversion can be found here: XML to JSON conversion in PHP SimpleXML.

like image 160
hakre Avatar answered Sep 28 '22 14:09

hakre