Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining the XML encoding from an XML declaration fragment: XmlDeclaration is not supported for partial content parsing

I'm working on some code to read an XML fragment which contains an XML declaration, e.g. <?xml version="1.0" encoding="utf-8"?> and parse the encoding. From MSDN, I should be able to do it like this:

var nt = new NameTable();
var mgr = new XmlNamespaceManager(nt);
var context = new XmlParserContext(null, mgr, null, XmlSpace.None);

var reader = new System.Xml.XmlTextReader(@"<?xml version=""1.0"" encoding=""UTF-8""?>", 
    System.Xml.XmlNodeType.XmlDeclaration, context);

However, I'm getting a System.Xml.XmlException on the call to the System.Xml.XmlTextReader constructor with an error message:

XmlNodeType XmlDeclaration is not supported for partial content parsing.

I've googled this error in quotes -- exactly zero results found (edit: now there's one result: this post) -- and without quotes, which yields nothing useful. I've also looked at MSDN for the XmlNodeType, and it doesn't say anything about it not being supported.

What am I missing here? How can I get an XmlTextReader instance from an XML declaration fragment?

Note, my goal here is just to determine the encoding of a partially-built XML document where I'm making the assumption that it at least contains a declaration node; thus, I'm trying to get reader.Encoding. If there's another way to do that, I'm open to that.

At present, I'm parsing the declaration manually using regex, which is not the best approach.

like image 597
rory.ap Avatar asked Dec 15 '15 15:12

rory.ap


2 Answers

Update: Getting the encoding from XML documentation or from XML fragment:

Here's a way to get the encoding without having to resort to fake root, using XmlReader.Create.

private static string GetXmlEncoding(string xmlString)
{
    if (string.IsNullOrWhiteSpace(xmlString)) throw new ArgumentException("The provided string value is null or empty.");

    using (var stringReader = new StringReader(xmlString))
    {
        var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };

        using (var xmlReader = XmlReader.Create(stringReader, settings))
        {
            if (!xmlReader.Read()) throw new ArgumentException(
                "The provided XML string does not contain enough data to be valid XML (see https://msdn.microsoft.com/en-us/library/system.xml.xmlreader.read)");

            var result = xmlReader.GetAttribute("encoding");
            return result;
        }
    }
}

Here's the output, with a full and fragment XML:

XML encoding ith XmlReader.Create

If you want to have System.Text.Encoding, you can modify the code to look like this:

    private static Encoding GetXmlEncoding(string xmlString)
    {
        using (StringReader stringReader = new StringReader(xmlString))
        {
            var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };

            var reader = XmlReader.Create(stringReader, settings);
            reader.Read();

            var encoding = reader.GetAttribute("encoding");

            var result = Encoding.GetEncoding(encoding);
            return result;
        }
    }

Old answer:

As you mentioned, XmlTextReader's Encoding-property contains the encoding.

Here's a full Console app's source code which hopefully is useful:

class Program
{
    static void Main(string[] args)
    {
        var asciiXML = @"<?xml version=""1.0"" encoding=""ASCII""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
        var utf8XML = @"<?xml version=""1.0"" encoding=""UTF-8""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";

        var asciiResult = GetXmlEncoding(asciiXML);
        var utfResult = GetXmlEncoding(utf8XML);

        Console.WriteLine(asciiResult);
        Console.WriteLine(utfResult);

        Console.ReadLine();
    }
    private static Encoding GetXmlEncoding(string s)
    {
        var stream = new MemoryStream(Encoding.UTF8.GetBytes(s));

        using (var xmlreader = new XmlTextReader(stream))
        {
            xmlreader.MoveToContent();
            var encoding = xmlreader.Encoding;

            return encoding;
        }
    }
}

Here's the output from the program:

XML Encoding output

If you know that the XML only contains the declaration, maybe you can add an empty root? So for example:

        var fragmentResult = GetXmlEncoding(xmlFragment + "<root/>");

XML Fragment

like image 75
Mikael Koskinen Avatar answered Sep 27 '22 17:09

Mikael Koskinen


Good evening, here's the solution with a System.Text.Encoding as output. I made it to be clear, and step by step.

class Program
{
    static void Main(string[] args)
    {
        var line = File.ReadLines(YourFileName).First();
        var correctXml = line + "<Root></Root>";
        var xml = XDocument.Parse(correctXml);
        var stringEncoding = xml.Declaration.Encoding;
        var encoding = System.Text.Encoding.GetEncoding(stringEncoding);
    }
}
like image 27
Vasilievski Avatar answered Sep 27 '22 16:09

Vasilievski