Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading .Doc File using DocumentFormat.OpenXml dll

When I am trying to read .doc file using DocumentFormat.OpenXml dll its giving error as "File contains corrupted data."

This dll is reading .docx file properly.

Can DocumentFormat.OpenXml dll help in reading .doc file?

string path = @"D:\Data\Test.doc";
string searchKeyWord = @"java";

private bool SearchWordIsMatched(string path, string searchKeyWord)
{
    try
    {
       using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
       {
           var text = wordDoc.MainDocumentPart.Document.InnerText;
           if (text.Contains(searchKeyWord))
               return true;
           else
               return false;
       }
     }
     catch (Exception ex)
     {
         throw ex;
     }
}
like image 312
Shardaprasad Soni Avatar asked Apr 02 '12 10:04

Shardaprasad Soni


People also ask

How do I add DocumentFormat OpenXML reference?

Right click on your project on Solution Explorer, and you should see the Add Reference option. Once selected, click on the Browser tab and browser the folder C:\Program Files\Open XML SDK\V2. 0\lib and select DocumentFormat. OpenXml.

How do I open an Office XML document in Word processing?

#1) Open Windows Explorer and browse to the location where the XML file is located. We have browsed to the location of our XML file MySampleXML as seen below. #2) Now right-click over the file and select Open With to choose Notepad or Microsoft Office Word from the list of options available to open the XML file.

What is OpenXML C#?

OpenXML is also known as OOXML and it fully XML-based format for office documents, including word processing documents, spreadsheets, presentations, as well as charts, diagrams, shapes, and other graphical material.


2 Answers

The old .doc files have a completely different format from the new .docx files. So, no, you can't use the OpenXml library to read .doc files.

To do that, you would either need to manually convert the files first, or you would need to use Office interop, instead of the Open XML SDK you're using now.

like image 141
svick Avatar answered Sep 28 '22 01:09

svick


I'm afraid there won't be any better answer than the ones already given. The Microsoft Word DOC format is binary whereas OpenXML formats such as DOCX are zipped XML files. The OpenXml framework is for working with the latter only.

As suggested, the only other option you have is to use Word interop or third party library to convert DOC -> DOCX which you can then work with the OpenXml library.

like image 37
Adam Avatar answered Sep 28 '22 02:09

Adam