Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I read MS Office files in a server without installing MS Office and without using the Interop Library?

The interop library is slow and needs MS Office installed. Many times you don't want to install MS Office on servers.

I'd like to use Apache POI, but I'm on .NET.

I need only to extract the text portion of the files, not creating nor "storing information" in Office files.

I need to tell you that I've got a very large document library, and I can't convert it to newer XML files.

I don't want to write a parser for the binaries files. A library like Apache POI does this for us. Unfortunately, it is only for the Java platform. Maybe I should consider writing this application in Java.

I am still not finding an open source alternative to POI in .NET, I think I'll write my own application in Java.

like image 621
Luca Molteni Avatar asked Sep 30 '08 13:09

Luca Molteni


2 Answers

For all MS Office versions:

  • You could use the third-party components like TX Text Controls for Word and TMS Flexcel Studio for Excel

For the new Office (2007):

  • You could do some basic stuff using .net functionality from system.io.packaging. See how at http://msdn.microsoft.com/en-us/library/bb332058.aspx

For the old Office (before 2007):

  • The old Office formats are now documented: http://www.microsoft.com/interop/docs/officebinaryformats.mspx. If you want to do something really easy you might consider trying it. But be aware that these formats are VERY complex.
like image 88
Ilya Kochetov Avatar answered Sep 29 '22 01:09

Ilya Kochetov


Check out the Aspose components. They are designed to mimic the Interop functionality without requiring a full Office install on a server.

like image 24
Jason Z Avatar answered Sep 29 '22 01:09

Jason Z