Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

illegal character in xml document

Tags:

.net

xml

I have a program that is generating Xml Files from data out of a database. In short code it does the following:

string dsn = "a db connection string";
XmlDocument d = new XmlDocument();
using (SqlConnection con = new SqlConnection(dsn)) {
    con.Open();
    string sql = "select id as Id, comment as Comment from Test where ... ";
    using (SqlCommand cmd = new SqlCommand(sql, con)) {
        DataSet ds = new DataSet("EXPORT");
        SqlDataAdapter da = new SqlDataAdapter(cmd);
        da.Fill(ds, "Test");
        d.LoadXml(ds.GetXml());
    }
}
d.Save(@"c:\test.xml");

When I have a look at the xml file it contains the invalid character & # x 1 A ;

<EXPORT>
  <Test>
    <Id>2</Id>
    <Comment> Keyboard NB&#x1A;5 linked</Comment>
  </Test>
</EXPORT>

This xml file cannot be opened by firefox browser saying invalid character ...

That Entity is reserved in ISO 8859-1 and CP1252 and should not be rendered by browsers. But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on ... Is there a easy way of getting rid of that reserved 'invalid characters' or encoding them in a way that Browsers do not have a Problem with it?

Many thanks for your opinion and tipps

like image 992
Tobias Pirzer Avatar asked Jun 24 '10 13:06

Tobias Pirzer


3 Answers

Not all characters are representable in XML.

In XML 1.0, none of the characters with values less than 0x20 can be used, except for TAB (0x09), LF (0x0A) and CR (0x0D).

In XML 1.1, just about anything except NUL (0x00) can be used.

If you have the option to use XML 1.1, and the receiving program supports XML 1.1 (not many do), then you can escape the 0x1A as &#26; or &#x1A;.

Wrapping it in CDATA is not a solution either; CDATA is just a convenience for escaping groups of characters differently than the standard &-mechanism.

Otherwise, you will need to remove it prior to serializing.

like image 178
lavinio Avatar answered Nov 17 '22 23:11

lavinio


I've run into this a few times when creating/manipulating XML from SQL data.

But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on

The XmlDocument doesn't perform any validation on the data that you send it, it leaves that to you (the developer). This XML document should be invalid in almost every thing that uses XML (but I could be wrong about that ... you could always test it :P)

Almost every time I've hit this problem, I ended up using replacing the offending XML data with either the proper character (if it has one) or just getting rid of it.

You could also try putting your xml inside a CData block, but that will bloat the file a tiny bit (not sure how big overall your file will be)

like image 28
Tony Abrams Avatar answered Nov 18 '22 00:11

Tony Abrams


Take a look to this xml parse error on illegal character

Conclusion (as I understood it): With XML 1.0 it is impossible to store this value.

like image 23
Christian Kuetbach Avatar answered Nov 17 '22 23:11

Christian Kuetbach