Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing xml string to an xml document fails if the string begins with <?xml... ?> section

Tags:

c#

.net

xml

I have an XML file begining like this:

<?xml version="1.0" encoding="utf-8"?> <Report xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner" xmlns="http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition">   <DataSources> 

When I run following code:

byte[] fileContent = //gets bytes             string stringContent = Encoding.UTF8.GetString(fileContent);             XDocument xml = XDocument.Parse(stringContent); 

I get following XmlException:

Data at the root level is invalid. Line 1, position 1.

Cutting out the version and encoding node fixes the problem. Why? How to process this xml correctly?

like image 768
agnieszka Avatar asked Jan 21 '10 17:01

agnieszka


People also ask

What is parse error in XML?

XML Parser Error When trying to open an XML document, a parser-error may occur. If the parser encounters an error, it may load an XML document containing the error description. The code example below tries to load an XML document that is not well-formed. You can read more about well-formed XML in XML Syntax.

What is XML parser in HTML?

XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents.


2 Answers

My first thought was that the encoding is Unicode when parsing XML from a .NET string type. It seems, though that XDocument's parsing is quite forgiving with respect to this.

The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally present at the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.

You can determine the preamble of an encoding by calling the GetPreamble method on an instance of the System.Text.Encoding class. For example:

// returns { 0xEF, 0xBB, 0xBF } byte[] preamble = Encoding.UTF8.GetPreamble(); 

The preamble should be handled correctly by XmlTextReader, so simply load your XDocument from an XmlTextReader:

XDocument xml; using (var xmlStream = new MemoryStream(fileContent)) using (var xmlReader = new XmlTextReader(xmlStream)) {     xml = XDocument.Load(xmlReader); } 
like image 55
Dave Cluderay Avatar answered Sep 17 '22 14:09

Dave Cluderay


If you only have bytes you could either load the bytes into a stream:

XmlDocument oXML;  using (MemoryStream oStream = new MemoryStream(oBytes)) {   oXML = new XmlDocument();   oXML.Load(oStream); } 

Or you could convert the bytes into a string (presuming that you know the encoding) before loading the XML:

string sXml; XmlDocument oXml;  sXml = Encoding.UTF8.GetString(oBytes); oXml = new XmlDocument(); oXml.LoadXml(sXml); 

I've shown my example as .NET 2.0 compatible, if you're using .NET 3.5 you can use XDocument instead of XmlDocument.

Load the bytes into a stream:

XDocument oXML;  using (MemoryStream oStream = new MemoryStream(oBytes)) using (XmlTextReader oReader = new XmlTextReader(oStream)) {   oXML = XDocument.Load(oReader); } 

Convert the bytes into a string:

string sXml; XDocument oXml;  sXml = Encoding.UTF8.GetString(oBytes); oXml = XDocument.Parse(sXml); 
like image 41
stevehipwell Avatar answered Sep 16 '22 14:09

stevehipwell