Is there any pure C++ library to extract plain text from a .doc file?
I'm developing a C++ program to read .doc and .pdf files. I have to extract plain text from the file and write it into a .txt file.
Open the DOCX file and click on File > Save As > Computer > Browser. Choose to save file as Plain Text (for XLSX files, save it as Text (Tab delimited)). Locate and open the text file with the name you have used to save it. This text file will contain only the text from your original file without any formatting.
In a Windows Microsoft Word document, click the Save As button from the File menu. Select Save As Type from the drop-down list then select Plain Text (*. txt). Click the Save button and a File Conversion window will open.
Press Ctrl + A on your keyboard to highlight all text in your document. Tip: You can also highlight your entire document by placing your mouse cursor in the left margin and then quickly clicking the left mouse button three times in a row. Press Ctrl + C to copy the entire highlighted selection.
You could have a look at the open source C library used by Abiword, wv.
You can also call out to a batch convert tool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With