To load XML files with arbitrary encoding I have the following code:
Encoding encoding;
using (var reader = new XmlTextReader(filepath))
{
reader.MoveToContent();
encoding = reader.Encoding;
}
var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
encoding);
using (var reader = XmlReader.Create(filepath, settings, context))
{
return XElement.Load(reader);
}
This works, but it seems a bit inefficient to open the file twice. Is there a better way to detect the encoding such that I can do:
XML documents generated in or parsed from national data items must be encoded in Unicode UTF-16 in big-endian format, CCSID 1200.
Unicode Transformation Format, 8-bit encoding form is designed for ease of use with existing ASCII-based systems and enables use of all the characters in the Unicode standard.
UTF-8 is the default character encoding for XML documents. Character encoding can be studied in our Character Set Tutorial. UTF-8 is also the default encoding for HTML5, CSS, JavaScript, PHP, and SQL.
UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.
Ok, I should have thought of this earlier. Both XmlTextReader (which gives us the Encoding) and XmlReader.Create (which allows us to specify encoding) accepts a Stream. So how about first opening a FileStream and then use this with both XmlTextReader and XmlReader, like this:
using (var txtreader = new FileStream(filepath, FileMode.Open))
{
using (var xmlreader = new XmlTextReader(txtreader))
{
// Read in the encoding info
xmlreader.MoveToContent();
var encoding = xmlreader.Encoding;
// Rewind to the beginning
txtreader.Seek(0, SeekOrigin.Begin);
var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
encoding);
using (var reader = XmlReader.Create(txtreader, settings, context))
{
return XElement.Load(reader);
}
}
}
This works like a charm. Reading XML files in an encoding independent way should have been more elegant but at least I'm getting away with only one file open.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With