Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove &amp from string when writing to xml in PHP

Tags:

php

xml

I am trying to write into an XML file using DOMDocument a link that contains the & sign. When I try this, the link becomes & in the xml. So from product=1&qty;=1 becomes product=1&qty;=1.

Can you please tell me a way to avoid this?

like image 227
Catalin Avatar asked Jun 16 '11 22:06

Catalin


2 Answers

As Gordon said, URIs are encoded this way. If you didn't encode the & to a &, the XML file would be messed up - you'd get errors parsing it. When you take the string back out of the XML file, if the &amp still shows up, either str_replace() like this:

$str = str_replace('&', '&', $str)

Or use htmlspecialchars_decode():

$str = htmlspecialchars_decode($str);

The added bonus of using htmlspecialchars_decode() is that it will decode any other HTML that might be in the string. For more, see here.

like image 156
Bojangles Avatar answered Nov 07 '22 10:11

Bojangles


Ampersands should be encoded like this. Changing it would be wrong.

See http://www.w3.org/TR/xml/

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings &amp; and &lt; respectively.

and http://www.w3.org/TR/xhtml1/#C_12

In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., &reg; for the registered trademark symbol "®"). Unfortunately, many HTML user agents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agents will not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&amp;"). For example, when the href attribute of the a element refers to a CGI script that takes parameters, it must be expressed as http://my.site.dom/cgi-bin/myscript.pl?class=guest&amp;name=user rather than as http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user

like image 5
Gordon Avatar answered Nov 07 '22 11:11

Gordon