I'm trying to convert some XML to JSON, which is easy enough with PHP
$file = file_get_contents('data.xml' );
$a = json_decode(json_encode((array) simplexml_load_string($file)),1);
print_r($a);
Taking the following XML
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>
<one lang="fr" type="bar">Test</one>
<one lang="fr" type="foo">Test</one>
<one lang="fr" type="baz">Test</one>
</bar>
<thunk>
<thud>
<bar lang="fr" name="bob">test</bar>
<bar lang="bz" name="frank">test</bar>
<bar lang="ar" name="alive">test</bar>
<bar lang="fr" name="bob">test</bar>
</thud>
</thunk>
</foo>
And paring it through simplexml produces
Array
(
[bar] => Array
(
[one] => Array
(
[0] => Test
[1] => Test
[2] => Test
)
)
[thunk] => Array
(
[thud] => Array
(
[bar] => Array
(
[0] => test
[1] => test
[2] => test
[3] => test
)
)
)
)
Where ideally the output would look like this
{
"foo": {
"bar": {
"one": [
{
"_lang": "fr",
"_type": "bar",
"__text": "Test"
},
{
"_lang": "fr",
"_type": "foo",
"__text": "Test"
},
{
"_lang": "fr",
"_type": "baz",
"__text": "Test"
}
]
},
"thunk": {
"thud": {
"bar": [
{
"_lang": "fr",
"_name": "bob",
"__text": "test"
},
{
"_lang": "bz",
"_name": "frank",
"__text": "test"
},
{
"_lang": "ar",
"_name": "alive",
"__text": "test"
},
{
"_lang": "fr",
"_name": "bob",
"__text": "test"
}
]
}
}
}
}
Trouble is that the output doesn't contain all the attributes for the child elements, some of these elements contain two or more attributes, is there a way to transform the xml with PHP or Python and include all the attributes found in all the children?
Thanks
Method 1: Using xmltodict and json module STEP 4: Convert the xml_data into a dictionary and store it in a variable JSON object are surrounded by curly braces { }. They are written in key and value pairs. json. loads() takes in a string and returns a json object.
To convert an XML document to JSON, follow these steps: Select the XML to JSON action from the Tools > JSON Tools menu. Choose or enter the Input URL of the XML document. Choose the path of the Output file that will contain the resulting JSON document.
To convert XML text to JavaScript object, use xml2js() . To convert XML text to JSON text, use xml2json() .
In my answer I'll cover PHP, specifically SimpleXMLElement which is already part of your code.
The basic way to JSON encode XML with SimpleXMLElement is similar to what you have in your question. You instantiate the XML object and then you json_encode it (Demo):
$xml = new SimpleXMLElement($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);
This produces an output close but not exactly like what you're looking for already. So what you do here with simplexml is that you change the standard way how json_encode
will encode the XML object.
This can be done with a new subtype of SimpleXMLElement implementing the JsonSerializable interface. Here is such a class that has the default way how PHP would JSON-serialize the object:
class JsonSerializer extends SimpleXmlElement implements JsonSerializable
{
/**
* SimpleXMLElement JSON serialization
*
* @return null|string
*
* @link http://php.net/JsonSerializable.jsonSerialize
* @see JsonSerializable::jsonSerialize
*/
function jsonSerialize()
{
return (array) $this;
}
}
Using it will produce the exact same output (Demo):
$xml = new JsonSerializer($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);
So now comes the interesting part to change the serialization just these bits to get your output.
First of all you need to differ between whether it's an element carrying other elements (has children) or it is a leaf-element of which you want the attributes and the text value:
if (count($this)) {
// serialize children if there are children
...
} else {
// serialize attributes and text for a leaf-elements
foreach ($this->attributes() as $name => $value) {
$array["_$name"] = (string) $value;
}
$array["__text"] = (string) $this;
}
That's done with this if/else. The if-block is for the children and the else-block for the leaf-elements. As the leaf-elements are easier, I've kept them in the example above. As you can see in the else-block it iterates over all attributes and adds those by their name prefixed with "_
" and finally the "__text
" entry by casting to string.
The handling of the children is a bit more convoluted as you need to differ between a single child element with it's name only or multiple children with the same name which require an additional array inside:
// serialize children if there are children
foreach ($this as $tag => $child) {
// child is a single-named element -or- child are multiple elements with the same name - needs array
if (count($child) > 1) {
$child = [$child->children()->getName() => iterator_to_array($child, false)];
}
$array[$tag] = $child;
}
Now there is another special case the serialization needs to deal with. You encode the root element name. So this routine needs to check for that condition (being the so called document-element) (compare with SimpleXML Type Cheatsheet) and serialize to that name under occasion:
if ($this->xpath('/*') == array($this)) {
// the root element needs to be named
$array = [$this->getName() => $array];
}
Finally all left to be done is to return the array:
return $array;
Compiled together this is a JsonSerializer done in simplexml tailored to your needs. Here the class and it's invocation at once:
class JsonSerializer extends SimpleXmlElement implements JsonSerializable
{
/**
* SimpleXMLElement JSON serialization
*
* @return null|string
*
* @link http://php.net/JsonSerializable.jsonSerialize
* @see JsonSerializable::jsonSerialize
*/
function jsonSerialize()
{
if (count($this)) {
// serialize children if there are children
foreach ($this as $tag => $child) {
// child is a single-named element -or- child are multiple elements with the same name - needs array
if (count($child) > 1) {
$child = [$child->children()->getName() => iterator_to_array($child, false)];
}
$array[$tag] = $child;
}
} else {
// serialize attributes and text for a leaf-elements
foreach ($this->attributes() as $name => $value) {
$array["_$name"] = (string) $value;
}
$array["__text"] = (string) $this;
}
if ($this->xpath('/*') == array($this)) {
// the root element needs to be named
$array = [$this->getName() => $array];
}
return $array;
}
}
$xml = new JsonSerializer($buffer);
echo json_encode($xml, JSON_PRETTY_PRINT);
Output (Demo):
{
"foo": {
"bar": {
"one": [
{
"_lang": "fr",
"_type": "bar",
"__text": "Test"
},
{
"_lang": "fr",
"_type": "foo",
"__text": "Test"
},
{
"_lang": "fr",
"_type": "baz",
"__text": "Test"
}
]
},
"thunk": {
"thud": {
"bar": [
{
"_lang": "fr",
"_name": "bob",
"__text": "test"
},
{
"_lang": "bz",
"_name": "frank",
"__text": "test"
},
{
"_lang": "ar",
"_name": "alive",
"__text": "test"
},
{
"_lang": "fr",
"_name": "bob",
"__text": "test"
}
]
}
}
}
}
I hope this was helpful. It's perhaps a little much at once, you find the JsonSerializable interface documented in the PHP manual as well, you can find more example there. Another example here on Stackoverflow with this kind of XML to JSON conversion can be found here: XML to JSON conversion in PHP SimpleXML.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With