Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert XDocument to byte array (and byte array to XDocument)

I've taken over a system that stores large XML documents in SQL Server in binary format.

Currently the data is saved by converting it to a string, then converting that string to a byte array. But recently with some large XML documents I'm getting out memory exceptions when attempting to convert to a string, so I want to bypass this process and go straight from the XDocument to a byte array.

The Entity Framework class holding the XML has been extended so that the binary data is accessible as a string like this:

partial class XmlData
{
    public string XmlString { get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
}

I want to further extend the class to look something like this:

partial class XmlData
{
    public string XmlString{ get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }

    public XDocument XDoc
    {
        get
        {
            // Convert XmlBinary to XDocument
        }
        set
        {
            // Convert XDocument to XmlBinary
        }
    }
}

I think I've nearly figured out the conversion, but when I use the partial classes XmlString method to get the XML back from the DB, the XML has always been cut off near the end, always at a different character count:

var memoryStream = new MemoryStream();
var xmlWriter = XmlWriter.Create(memoryStream);
myXDocument.WriteTo(xmlWriter);
XmlData.XmlBinary = memoryStream.ToArray();

SOLUTION

Here's the basic conversion:

var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
using (var memoryStream = new MemoryStream())
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
    myXDocument.WriteTo(xmlWriter);
    xmlWriter.Flush();
    XmlData.XmlBinary = memoryStream.ToArray();
}

But for some reason in this process, some weird non ascii characters get added to the XML so using my previous XmlString method would load those weird characters and XDocument.Parse() would break, so my new partial class looks like this:

partial class XmlData
{
    public string XmlString 
    { 
        get 
        {
            var xml = Encoding.UTF8.GetString(XmlBinary);
            xml = Regex.Replace(xml, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
            return xml;
        } 
        set 
        { 
            value = Regex.Replace(value, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
            XmlBinary = Encoding.UTF8.GetBytes(value); 
        } 
    }

    public XDocument XDoc
    {
        get
        {
            using (var memoryStream = new MemoryStream(XmlBinary))
            using (var xmlReader = XmlReader.Create(memoryStream))
            {
                var xml = XDocument.Load(xmlReader);
                return xml;
            }
        }
        set
        {
            var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
            using (var memoryStream = new MemoryStream())
            using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
            {
                value.WriteTo(xmlWriter);
                xmlWriter.Flush();
                XmlBinary = memoryStream.ToArray();
            }
        }
    }
}
like image 452
Owen Avatar asked Jun 18 '14 09:06

Owen


1 Answers

That sounds like buffer of one of streams / writers was not flushed during read or write - use using (...) for autoclose, flush and dispose, and also check that in all places where you finished read / write you've done .Flush()

like image 122
Lanorkin Avatar answered Oct 19 '22 09:10

Lanorkin