Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OLE DB vs OPEN XML SDK vs Excel.interop

Tags:

c#

oledb

openxml

I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?

OLE DB, open XML SDK, or Excel Interop?

  • Which is the easiest to use?
  • Can you retrieve all the information using one or the other? i.e, date, times, merged cells, tables, pivottables, etc.
like image 688
cecemel Avatar asked Apr 28 '12 16:04

cecemel


People also ask

What is Open XML SDK?

The Open XML SDK provides . NET developers with a set of strongly typed classes that make it easy to read, write and manipulate the parts and content in an Open XML document such as the DOCX, XLSX or PPTX files created by Microsoft Office. It can be used in any .

What is Open XML SDK 2.5 for Microsoft Office?

The Open XML SDK 2.5 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML SDK 2.5 encapsulates many common tasks that developers perform on Open XML packages, so that you can perform complex operations with just a few lines of code.

What is ClosedXML?

ClosedXML is a . NET library for reading, manipulating and writing Excel 2007+ (. xlsx, . xlsm) files. It aims to provide an intuitive and user-friendly interface to dealing with the underlying OpenXML API.


1 Answers

You can try all of them and choose the one that fits you most...

Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.

Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.

On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.

The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?

Good luck, then.

like image 179
mj82 Avatar answered Oct 01 '22 20:10

mj82