Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the largest XML file SSIS can extract data from?

Tags:

xml

ssis

We have an architecture where we use SSIS to extract data from XML batch files into a staging database for validation, prior to exporting it into production.

We control the XML format to some extent, and I've been asked to determine what the maximum number of records the XML batch file should contain. Based on the XML schema and some sample data, I can estimate the average record size and do some projections from there.

However, coming at it from the other angle, I'd like to get an indication of the technical limitations of SSIS when dealing with large XML files.

I'm aware that SSIS will flatten and transform the XML document into its own tabular, in-memory representation, so RAM becomes an obvious limiting factor but in what proportion?

Can you say something like, SSIS requires something like at least 2.5 times the size of the file you're trying to load, in available memory? Assuming that I have a 32GB box dedicated to this data-loading function, how large can my XML files be?

I'm aware that there might other factors included, such as the complexity of the schema, number of nested elements, etc. but it'd be nice to have a starting point.

like image 991
Christophe Chuvan Avatar asked Mar 26 '09 23:03

Christophe Chuvan


1 Answers

Xml Source does not load the whole document in memory, but streams data out as it reads it from XML file. So if you are reading the XML and writing it to e.g. text files without complex transformations, you need relatively little memory. Also the amount of memory you need (after some threshold) stops growing when XML file grows - so you may handle potentially unlimited XML files.

E.g. this guy exported the whole Wikipedia content (20Gb XML file): http://www.ideaexcursion.com/2009/01/26/import-wikipedia-articles-into-sql-server-with-ssis/

Of course, you will probably do something with that data, e.g. join multiple streams coming out of the XML Source. Depending on what you need, you might need a lot of memory, because some transforms do keep the whole dataset in memory, or perform much better if you have enough memory for the whole dataset.

like image 91
Michael Entin Avatar answered Sep 28 '22 12:09

Michael Entin