Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make my xml safe for parsing (when it has & character in it)?

I've been given an xml string which I need to put through a parser. Its currently complaining because of an illegal xml character. Very simplified example:

<someXml>this & that</someXml>

I know that the solution is to replace & with &amp;, but I'm not generating the XML and therefore have no control over the values.

A simple string replace is not the right way to to this since the '&' has special meaning in XML and a global replace of '&' with '&amp;' would ruin the special meaning which was intended. Is there a solution to take a full xml document and 'fix' it so that '&' become '&amp;', but only where intended? Am I safe to globally replace ' & ' with ' &amp; ' (note the spaces on either side)?

like image 558
Chris Knight Avatar asked Dec 16 '22 14:12

Chris Knight


2 Answers

I would suggest to ask the provider of this document to fix it. As it is, it's not (valid) XML! If they commited themselves to the XML format, they should fix it.

like image 51
Puce Avatar answered Jan 26 '23 01:01

Puce


I think this an interesting question, because it's a situation that may really happen in real-life. Although I believe that the right thing to do is asking the XML provider to fix the XML and make it valid, I thought one option was trying with a lenient parser. I did some search and I found this blog post talking about this same problem, and suggesting the same solution that I was think of. You may try with jsoup. Let me repeat that I think this is not the best thing to do: you should really ask the XML provider to fix it.

like image 23
MarcoS Avatar answered Jan 26 '23 00:01

MarcoS