Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract plain text from MS word document file in pure C++? [closed]

Tags:

c++

Is there any pure C++ library to extract plain text from a .doc file?

I'm developing a C++ program to read .doc and .pdf files. I have to extract plain text from the file and write it into a .txt file.

like image 680
ilango j Avatar asked Nov 24 '11 04:11

ilango j


People also ask

How do I extract specific text from a Word document?

Open the DOCX file and click on File > Save As > Computer > Browser. Choose to save file as Plain Text (for XLSX files, save it as Text (Tab delimited)). Locate and open the text file with the name you have used to save it. This text file will contain only the text from your original file without any formatting.

How do I save a Word document as a plain text file?

In a Windows Microsoft Word document, click the Save As button from the File menu. Select Save As Type from the drop-down list then select Plain Text (*. txt). Click the Save button and a File Conversion window will open.

How can you select and copy the entire text from a Word file?

Press Ctrl + A on your keyboard to highlight all text in your document. Tip: You can also highlight your entire document by placing your mouse cursor in the left margin and then quickly clicking the left mouse button three times in a row. Press Ctrl + C to copy the entire highlighted selection.


1 Answers

You could have a look at the open source C library used by Abiword, wv.

You can also call out to a batch convert tool

  • Open source batch converter, based on OpenOffice: http://dag.wieers.com/home-made/unoconv/
  • The open source for unix: http://www.wagner.pp.ru/~vitus/software/catdoc/
  • Proprietary for windows: http://doc2txt.com/. Note I havn't tried this one.
like image 196
gnud Avatar answered Sep 20 '22 14:09

gnud