How/Can I use linq to xml to query huge xml files with reasonable memory consumption?

Tags:

I've not done much with linq to xml, but all the examples I've seen load the entire XML document into memory.

What if the XML file is, say, 8GB, and you really don't have the option?

My first thought is to use the XElement.Load Method (TextReader) in combination with an instance of the FileStream Class.

QUESTION: will this work, and is this the right way to approach the problem of searching a very large XML file?

Note: high performance isn't required.. i'm trying to get linq to xml to basically do the work of the program i could write that loops through every line of my big file and gathers up, but since linq is "loop centric" I'd expect this to be possible....

892

asked Apr 30 '11 00:04

Aaron Anodide

2 Answers

Using XElement.Load will load the whole file into the memory. Instead, use XmlReader with the XNode.ReadFrom function, where you can selectively load notes found by XmlReader with XElement for further processing, if you need to. MSDN has a very good example doing just that: http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx

If you just need to search the xml document, XmlReader alone will suffice and will not load the whole document into the memory.

116

answered Nov 15 '22 17:11

Teoman Soygul

Gabriel,

Dude, this isn't exactly answering your ACTUAL question (How to read big xml docs using linq) but you might want to checkout my old question What's the best way to parse big XML documents in C-Sharp. The last "answer" (timewise) was a "note to self" on what ACTUALLY WORKED. It turns out that a hybrid document-XmlReader & doclet-XmlSerializer is fast (enough) AND flexible.

BUT note that I was dealing with docs upto only 150MB. If you REALLY have to handle docs as big as 8GB? then I guess you're likely to encounter all sorts of problems; including issues with the O/S's LARGE_FILE (>2GB) handling... in which case I strongly suggest you keep things as-primitive-as-possible... and XmlReader is as primitive as possible (and THE fastest according to my testing) XML-parser available in the Microsoft namespace.

Also: I've just noticed a belated comment in my old thread suggesting that I check out VTD-XML... I had a quick look at it just now... It "looks promising", even if the author seems to have contracted a terminal case of FIGJAM. He claims it'll handle docs of upto 256GB; to which I reply "Yeah, have you TESTED it? In WHAT environment?" It sounds like it should work though... I've used this same technique to implement "hyperlinks" in a textual help-system; back before HTML.

Anyway good luck with this, and your overall project. Cheers. Keith.

answered Nov 15 '22 18:11

corlettk

Related questions
                            
                                How to calculate HttpWebRequest spent outbound and inbound internet traffic
                            
                                Execute .NET IL code in C#
                            
                                how to post json object array to a web api
                            
                                Why EF navigation property return null?
                            
                                Google Protobuf 3.0.0 assemblies for C#
                            
                                Why is first HttpClient.PostAsync call extremely slow in my C# winforms app?
                            
                                Is it possible to redirect request from middleware in .net core
                            
                                Should I cache and reuse HttpClient created from HttpClientFactory?
                            
                                How can I make all of the IDisposable classes colored differently in the Visual Studio IDE?
                            
                                How to embed a satellite assembly into the EXE file
                            
                                How do you embed app.config in C# projects?
                            
                                Can I embed other files in a DLL?
                            
                                catch another process unhandled exception
                            
                                Calling Thread.Abort on a thread from a ThreadPool
                            
                                Detecting if a PNG image file is a Transparent image?
                            
                                Requested Service not found
                            
                                Inserting Certificate (with privatekey) in Root, LocalMachine certificate store fails in .NET 4
                            
                                Passing a Structure to C++ API using Marshal.StructureToPtr in C#
                            
                                Will main thread catch exception thrown by another thread?
                            
                                Detect socket disconnect in WCF

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With