I want to load large XML documents into XDocument objects.
The simple synchronous approach using XDocument.Load(path, loadOptions)
works great, but blocks for an uncomfortably long time in a GUI context when loading large files (particularly from network storage).
I wrote this async version with the intention of improving responsiveness in document loading, particularly when loading files over the network.
public static async Task<XDocument> LoadAsync(String path, LoadOptions loadOptions = LoadOptions.PreserveWhitespace)
{
String xml;
using (var stream = File.OpenText(path))
{
xml = await stream.ReadToEndAsync();
}
return XDocument.Parse(xml, loadOptions);
}
However, on a 200 MB XML raw file loaded from local disk, the synchronous version completes in a few seconds. The asynchronous version (running in a 32-bit context) instead throws an OutOfMemoryException
:
at System.Text.StringBuilder.ToString()
at System.IO.StreamReader.<ReadToEndAsyncInternal>d__62.MoveNext()
I imagine this is because of the temporary string variable used to hold the raw XML in memory for parsing by the XDocument
. Presumably in the synchronous scenario, XDocument.Load()
is able to stream through the source file, and never needs to create a single huge String to hold the entire file.
Is there any way to get the best of both worlds? Load the XDocument
with fully asynchronous I/O, and without needing to create a large temporary string?
Late answer, but I needed the async read as well on a "legacy" .NET Framework version so I figured out a way to truly read the content in an async way without reverting to buffering the XML data in memory.
Since the writer provided by XDocument.CreateWriter()
does not support async writing and thus XmlWriter.WriteNodeAsync()
fails, the code performs async reads and converts this to sync writes on the XDocument-writer. The code is inspired by the way XmlWriter.WriteNodeAsync()
works however. Since the writer builds an in-memory DOM this is actually even better than actually doing async writes.
public static async Task<XDocument> LoadAsync(Stream stream, LoadOptions loadOptions) {
using (var reader = XmlReader.Create(stream, new XmlReaderSettings() {
DtdProcessing = DtdProcessing.Ignore,
IgnoreWhitespace = (loadOptions&LoadOptions.PreserveWhitespace) == LoadOptions.None,
XmlResolver = null,
CloseInput = false,
Async = true
})) {
var result = new XDocument();
using (var writer = result.CreateWriter()) {
do {
switch (reader.NodeType) {
case XmlNodeType.Element:
writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
writer.WriteAttributes(reader, true);
if (reader.IsEmptyElement) {
writer.WriteEndElement();
}
break;
case XmlNodeType.Text:
writer.WriteString(await reader.GetValueAsync().ConfigureAwait(false));
break;
case XmlNodeType.CDATA:
writer.WriteCData(reader.Value);
break;
case XmlNodeType.EntityReference:
writer.WriteEntityRef(reader.Name);
break;
case XmlNodeType.ProcessingInstruction:
case XmlNodeType.XmlDeclaration:
writer.WriteProcessingInstruction(reader.Name, reader.Value);
break;
case XmlNodeType.Comment:
writer.WriteComment(reader.Value);
break;
case XmlNodeType.DocumentType:
writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
break;
case XmlNodeType.Whitespace:
case XmlNodeType.SignificantWhitespace:
writer.WriteWhitespace(await reader.GetValueAsync().ConfigureAwait(false));
break;
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
} while (await reader.ReadAsync().ConfigureAwait(false));
}
return result;
}
}
XDocument.LoadAsync()
is available in .NET Core 2.0: https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.loadasync?view=netcore-2.0
First of all the task is not being run asynchronously. You would need to use either a built in async IO command or spin up a task on the thread pool yourself. For example
public static Task<XDocument> LoadAsync
( String path
, LoadOptions loadOptions = LoadOptions.PreserveWhitespace
)
{
return Task.Run(()=>{
using (var stream = File.OpenText(path))
{
return XDocument.Load(stream, loadOptions);
}
});
}
and if you use the stream version of Parse then you don't get a temporary string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With