Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EPPlus multiple saves

I'm using EPPlus library in asp.net + C# to create .xlsx file. My file may contain a few million records and i'm creating one ExcelWorksheet per 100K records. problem is that a 40Mb xlsx that contains almost 1.5M records my take multiple Gbs of main memory and after that i get an OutOfMemoryException, question is: How can i reduce memory taken X Gb for 40M?:-| or save ExcelPackage gradually and deallocate memory, (it seems that i can not save more than once and after that whole package is disposed).

If there is no solution, can anybody introduce a alternative for EPPlus (with almost all features like styling, worksheets and etc)

like image 289
Uchiha_Sasuke Avatar asked Feb 20 '13 11:02

Uchiha_Sasuke


2 Answers

This issue seems to be solved in the latest version of EPPlus (4.0.x)


EDIT: Adding reference links to pages which point to improved memory management in EPPlus 4.0.4.

https://epplus.codeplex.com/releases/view/118053#ReviewsAnchor Reviews by users about improved memory performance in 4.x version as compared to 3.x version.

https://epplus.codeplex.com/wikipage?title=Roadmap Version 4.0: New cellstore to improve insert, delete performance and memory consumption

This link explains how to ensure that loading HUGE number of cells is optimized.

http://epplus.codeplex.com/wikipage?title=FAQ&referringTitle=Documentation Refer section "I have a lot of data I want to load. What should I think of to get the best performance?"

Also, I've personally tested EPPlus 4.0.4 today, by writing out 1.5 Million records at one go, of 5 numeric rows and 1 DateTime row, and the peak memory working set reported by Windows Task Manager was just 711 MB. The Non-Paged Pool shown by Windows Task Manager was just 75K or so! Of course, I'm not sure whether these numbers capture the full impact of the memory footprint, but these are indicative. The output Excel file was around 59MB (might be that my columns were more than the sample data mentioned by you in your original post.)

NOTE: I did get an "OutOfMemoryException" when I tried to write 4.5 Million records of 7 columns in one go!

Is my test rigorous enough? Maybe not...Works well for me though.


However, one workaround I could think of to overcome the large memory requirements in earlier versions is to split and save an xlsx file for every 100K records. After saving, start using a new file (with appropriate file name counter increment) for the next 100K records.

At the end of your operation, you'd end up having 10 files of 100K records for say a total of 1 million records.

Might seem a bit of a hack, but hey, might be better than having to rewrite your code base to use some other library (free or commercial).

like image 91
Bharat Mallapur Avatar answered Nov 03 '22 06:11

Bharat Mallapur


Multiple saves will not help. If you try to save and close the file in the middle of your export process, then invoke garbage collection, you will see, that consumed memory will be freed. But if you then will try to open the file again, the same amount of memory will be consumed by the process. And if you try to continue your export processing, you will fall into the same problem again.

AFAIK all modern .NET libraries based on Open XML SDK perform processing in memory. So do EpPlus. The only way is to perform processing on hard disk without loading into memory.

like image 27
Deilan Avatar answered Nov 03 '22 05:11

Deilan