I am having some grief with an XML feed that I am being sent. I know it is invalid, but the development cycle of the sending program is such that it is not worth waiting for them to be able to correct the error. So I am looking for a work around for it, some way to get PHP to let me read the XML and merge/drop the invalid attribute entries while keeping all the others.
The fault is that I have duplicate attributes on an XML node. I have been using simpleXML to read the files and process them into a useful values, but this line just breaks the system outright. The offending XML looks like this
<dCategory dec="1102" dup="45" dup="4576" loc="274" mov="31493" prf="23469" unq="240031" xxx="7861" />
What I would really like is the PHP equivalent of C#'s .MoveToNextAttribute() on the XML reader. I can't seem to find anything that doesn't just blow up when presented with the duplicate attribute.
Anyone help out on this?
The answers linked to address errors in characters within the XML itself. e.g. & not appearing as &. The problem here is that the structure of the XML is broken, not the content. The answer in that thread returns
parser error : Attribute attr1 redefined
when presented with the XML
<open-1 attr1="atr1" attr1="atr1">Text</open-1>
Which is what I am trying to parse.
You could use tidy to clean up your input :
<?php
$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';
$config = [
'indent' => true,
'output-xml' => true,
'input-xml' => true,
];
$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;
Will output :
<?xml version="1.0" encoding="utf-8"?>
<open-1 attr1="atr1">Text</open-1>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With