I've taken over a system that stores large XML documents in SQL Server in binary format.
Currently the data is saved by converting it to a string, then converting that string to a byte array. But recently with some large XML documents I'm getting out memory exceptions when attempting to convert to a string, so I want to bypass this process and go straight from the XDocument to a byte array.
The Entity Framework class holding the XML has been extended so that the binary data is accessible as a string like this:
partial class XmlData
{
public string XmlString { get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
}
I want to further extend the class to look something like this:
partial class XmlData
{
public string XmlString{ get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
public XDocument XDoc
{
get
{
// Convert XmlBinary to XDocument
}
set
{
// Convert XDocument to XmlBinary
}
}
}
I think I've nearly figured out the conversion, but when I use the partial classes XmlString method to get the XML back from the DB, the XML has always been cut off near the end, always at a different character count:
var memoryStream = new MemoryStream();
var xmlWriter = XmlWriter.Create(memoryStream);
myXDocument.WriteTo(xmlWriter);
XmlData.XmlBinary = memoryStream.ToArray();
SOLUTION
Here's the basic conversion:
var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
using (var memoryStream = new MemoryStream())
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
myXDocument.WriteTo(xmlWriter);
xmlWriter.Flush();
XmlData.XmlBinary = memoryStream.ToArray();
}
But for some reason in this process, some weird non ascii characters get added to the XML so using my previous XmlString method would load those weird characters and XDocument.Parse() would break, so my new partial class looks like this:
partial class XmlData
{
public string XmlString
{
get
{
var xml = Encoding.UTF8.GetString(XmlBinary);
xml = Regex.Replace(xml, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
return xml;
}
set
{
value = Regex.Replace(value, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
XmlBinary = Encoding.UTF8.GetBytes(value);
}
}
public XDocument XDoc
{
get
{
using (var memoryStream = new MemoryStream(XmlBinary))
using (var xmlReader = XmlReader.Create(memoryStream))
{
var xml = XDocument.Load(xmlReader);
return xml;
}
}
set
{
var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
using (var memoryStream = new MemoryStream())
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
value.WriteTo(xmlWriter);
xmlWriter.Flush();
XmlBinary = memoryStream.ToArray();
}
}
}
}
That sounds like buffer of one of streams / writers was not flushed during read or write - use using (...)
for autoclose, flush and dispose, and also check that in all places where you finished read / write you've done .Flush()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With