Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php output xml produces parse error "’"

Is there any function that I can use to parse any string to ensure it won't cause xml parsing problems? I have a php script outputting a xml file with content obtained from forms.

The thing is, apart from the usual string checks from a php form, some of the user text causes xml parsing errors. I'm facing this "’" in particular. This is the error I'm getting Entity 'rsquo' not defined

Does anyone have any experience in encoding text for xml output?

Thank you!


Some clarification: I'm outputting content from forms in a xml file, which is subsequently parsed by javascript.

I process all form inputs with: htmlentities(trim($_POST['content']), ENT_QUOTES, 'UTF-8');

When I want to output this content into a xml file, how should I encode it such that it won't throw up xml parsing errors?

So far the following 2 solutions work:

1) echo '<content><![CDATA['.$content.']]></content>';

2) echo '<content>'.htmlspecialchars(html_entity_decode($content, ENT_QUOTES, 'UTF-8'),ENT_QUOTES, 'UTF-8').'</content>'."\n";

Are the above 2 solutions safe? Which is better?

Thanks, sorry for not providing this information earlier.

like image 294
Lyon Avatar asked Jun 29 '10 16:06

Lyon


1 Answers

You take it the wrong way - don't look for a parser which doesn't give you errors. Instead try to have a well-formed xml.

How did you get &rsquo; from the user? If he literally typed it in, you are not processing the input correctly - for example you should escape & to &amp;. If it is you who put the entity there (perhaps in place of some apostrophe), either define it in DTD (<!ENTITY rsquo "&x2019;">) or write it using a numeric notation (&#x2019;), because almost every of the named entities are a part of HTML. XML defines only a few basic ones, as Gumbo pointed out.

EDIT based on additions to the question:

  • In #1, you escape the content in the way that if user types in ]]> <°)))><, you have a problem.
  • In #2, you are doing the encoding and decoding which result in the original value of the $content. the decoding should not be necessary (if you don't expect users to post values like &amp; which should be interpreted like &).

If you use htmlspecialchars() with ENT_QUOTES, it should be ok, but see how Drupal does it.

like image 64
Krab Avatar answered Oct 29 '22 01:10

Krab