Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How/Can I use linq to xml to query huge xml files with reasonable memory consumption?

Tags:

c#

xml

linq

I've not done much with linq to xml, but all the examples I've seen load the entire XML document into memory.

What if the XML file is, say, 8GB, and you really don't have the option?

My first thought is to use the XElement.Load Method (TextReader) in combination with an instance of the FileStream Class.

QUESTION: will this work, and is this the right way to approach the problem of searching a very large XML file?

Note: high performance isn't required.. i'm trying to get linq to xml to basically do the work of the program i could write that loops through every line of my big file and gathers up, but since linq is "loop centric" I'd expect this to be possible....

like image 892
Aaron Anodide Avatar asked Apr 30 '11 00:04

Aaron Anodide


People also ask

Does LINQ support querying XML datasets?

The most important advantage of LINQ to XML is its integration with Language-Integrated Query (LINQ). This integration enables you to write queries on the in-memory XML document to retrieve collections of elements and attributes.

Which option is created to retrieve data into XML using LINQ?

The LINQ to XML will bring the XML document into memory and allows us to write LINQ Queries on in-memory XML document to get the XML document elements and attributes. To use LINQ to XML functionality in our applications, we need to add "System. Xml. Linq" namespace reference.

How big can an XML file be?

Even though the maximum file size is set to 100 MB, it is still possible to import an XML file larger than 100 MB via P6 Professional. The issue can be reproduced at will with the following steps: 1. In P6 Admin, set the Services --> Import / Export Options --> Maximum file size to 102 000 (102 MB).


2 Answers

Using XElement.Load will load the whole file into the memory. Instead, use XmlReader with the XNode.ReadFrom function, where you can selectively load notes found by XmlReader with XElement for further processing, if you need to. MSDN has a very good example doing just that: http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx

If you just need to search the xml document, XmlReader alone will suffice and will not load the whole document into the memory.

like image 116
Teoman Soygul Avatar answered Nov 15 '22 17:11

Teoman Soygul


Gabriel,

Dude, this isn't exactly answering your ACTUAL question (How to read big xml docs using linq) but you might want to checkout my old question What's the best way to parse big XML documents in C-Sharp. The last "answer" (timewise) was a "note to self" on what ACTUALLY WORKED. It turns out that a hybrid document-XmlReader & doclet-XmlSerializer is fast (enough) AND flexible.

BUT note that I was dealing with docs upto only 150MB. If you REALLY have to handle docs as big as 8GB? then I guess you're likely to encounter all sorts of problems; including issues with the O/S's LARGE_FILE (>2GB) handling... in which case I strongly suggest you keep things as-primitive-as-possible... and XmlReader is as primitive as possible (and THE fastest according to my testing) XML-parser available in the Microsoft namespace.

Also: I've just noticed a belated comment in my old thread suggesting that I check out VTD-XML... I had a quick look at it just now... It "looks promising", even if the author seems to have contracted a terminal case of FIGJAM. He claims it'll handle docs of upto 256GB; to which I reply "Yeah, have you TESTED it? In WHAT environment?" It sounds like it should work though... I've used this same technique to implement "hyperlinks" in a textual help-system; back before HTML.

Anyway good luck with this, and your overall project. Cheers. Keith.

like image 21
corlettk Avatar answered Nov 15 '22 18:11

corlettk