I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?
OLE DB, open XML SDK, or Excel Interop?
The Open XML SDK provides . NET developers with a set of strongly typed classes that make it easy to read, write and manipulate the parts and content in an Open XML document such as the DOCX, XLSX or PPTX files created by Microsoft Office. It can be used in any .
The Open XML SDK 2.5 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML SDK 2.5 encapsulates many common tasks that developers perform on Open XML packages, so that you can perform complex operations with just a few lines of code.
ClosedXML is a . NET library for reading, manipulating and writing Excel 2007+ (. xlsx, . xlsm) files. It aims to provide an intuitive and user-friendly interface to dealing with the underlying OpenXML API.
You can try all of them and choose the one that fits you most...
Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.
Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.
On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.
The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?
Good luck, then.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With