Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP SimpleXML doesn't preserve line breaks in XML attributes

Tags:

php

xml

simplexml

I have to parse externally provided XML that has attributes with line breaks in them. Using SimpleXML, the line breaks seem to be lost. According to another stackoverflow question, line breaks should be valid (even though far less than ideal!) for XML.

Why are they lost? [edit] And how can I preserve them? [/edit]

Here is a demo file script (note that when the line breaks are not in an attribute they are preserved).

PHP File with embedded XML

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<Rows>
    <data Title='Data Title' Remarks='First line of the row.
Followed by the second line.
Even a third!' />
    <data Title='Full Title' Remarks='None really'>First line of the row.
Followed by the second line.
Even a third!</data>
</Rows>
XML;

$xml = new SimpleXMLElement( $xml );
print '<pre>'; print_r($xml); print '</pre>';

Output from print_r

SimpleXMLElement Object
(
    [data] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [Title] => Data Title
                            [Remarks] => First line of the row. Followed by the second line. Even a third!
                        )

                )

            [1] => First line of the row.
Followed by the second line.
Even a third!
        )

)
like image 332
Joshua Avatar asked Sep 21 '09 23:09

Joshua


1 Answers

Using SimpleXML, the line breaks seem to be lost.

Yes, that is expected... in fact it is required of any conformant XML parser that newlines in attribute values represent simple spaces. See attribute value normalisation in the XML spec.

If there was supposed to be a real newline character in the attribute value, the XML should have included a &#10; character reference instead of a raw newline.

like image 59
bobince Avatar answered Oct 11 '22 18:10

bobince