Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting PHP to acknowledge XML errors

Tags:

php

xml

I am having some grief with an XML feed that I am being sent. I know it is invalid, but the development cycle of the sending program is such that it is not worth waiting for them to be able to correct the error. So I am looking for a work around for it, some way to get PHP to let me read the XML and merge/drop the invalid attribute entries while keeping all the others.

The fault is that I have duplicate attributes on an XML node. I have been using simpleXML to read the files and process them into a useful values, but this line just breaks the system outright. The offending XML looks like this

<dCategory dec="1102" dup="45" dup="4576" loc="274" mov="31493" prf="23469" unq="240031" xxx="7861" />

What I would really like is the PHP equivalent of C#'s .MoveToNextAttribute() on the XML reader. I can't seem to find anything that doesn't just blow up when presented with the duplicate attribute.

Anyone help out on this?

The answers linked to address errors in characters within the XML itself. e.g. & not appearing as &. The problem here is that the structure of the XML is broken, not the content. The answer in that thread returns

 parser error : Attribute attr1 redefined

when presented with the XML

<open-1 attr1="atr1" attr1="atr1">Text</open-1>

Which is what I am trying to parse.

like image 356
Khainestar Avatar asked Jan 19 '16 15:01

Khainestar


1 Answers

You could use tidy to clean up your input :

<?php

$buffer = '<?xml version="1.0" encoding="UTF-8"?><open-1 attr1="atr1" attr1="atr1">Text</open-1>';

$config = [
 'indent' => true,
 'output-xml' => true,
 'input-xml' => true,
];

$tidy = tidy_parse_string($buffer, $config, 'UTF8');
$tidy->cleanRepair();
echo $tidy;

Will output :

 <?xml version="1.0" encoding="utf-8"?>
 <open-1 attr1="atr1">Text</open-1>
like image 56
Nikos Avatar answered Nov 16 '22 14:11

Nikos