Is there any function that I can use to parse any string to ensure it won't cause xml parsing problems? I have a php script outputting a xml file with content obtained from forms.
The thing is, apart from the usual string checks from a php form, some of the user text causes xml parsing errors. I'm facing this "’
" in particular. This is the error I'm getting Entity 'rsquo' not defined
Does anyone have any experience in encoding text for xml output?
Thank you!
Some clarification: I'm outputting content from forms in a xml file, which is subsequently parsed by javascript.
I process all form inputs with: htmlentities(trim($_POST['content']), ENT_QUOTES, 'UTF-8');
When I want to output this content into a xml file, how should I encode it such that it won't throw up xml parsing errors?
So far the following 2 solutions work:
1) echo '<content><![CDATA['.$content.']]></content>';
2) echo '<content>'.htmlspecialchars(html_entity_decode($content, ENT_QUOTES, 'UTF-8'),ENT_QUOTES, 'UTF-8').'</content>'."\n";
Are the above 2 solutions safe? Which is better?
Thanks, sorry for not providing this information earlier.
You take it the wrong way - don't look for a parser which doesn't give you errors. Instead try to have a well-formed xml.
How did you get ’
from the user? If he literally typed it in, you are not processing the input correctly - for example you should escape & to &
. If it is you who put the entity there (perhaps in place of some apostrophe), either define it in DTD (<!ENTITY rsquo "&x2019;">
) or write it using a numeric notation (’
), because almost every of the named entities are a part of HTML. XML defines only a few basic ones, as Gumbo pointed out.
EDIT based on additions to the question:
]]> <°)))><
, you have a problem.&
which should be interpreted like &).If you use htmlspecialchars() with ENT_QUOTES, it should be ok, but see how Drupal does it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With