I can't find any packages to do this. I know PHP has a ton of libraries for PDFs (like http://www.fpdf.org/) but anything for Node?
You can use Aspose. Words Cloud SDK for Node. js to extract text from DOC/DOCX,Open Office, and PDF. It's paid API but the free plan provides 150 free monthly API calls.
First install NodeJS file system. Second is pdf reader. Install Xlsx for reading Xls, xlsx workbooks. node-stream-zip is to read doc and Docx file.
textract is a great lib that supports PDFs, Doc, Docx, etc.
Looks like there's a few for pdf, but I didn't find any for Word.
CPU bound processing like that isn't really Node's strong point anyway (i.e. you get no additional benefits using node to do it over any other language). A pragmatic approach would be to find a good tool and utilise it from Node.
I have heard good things around the office about docsplit http://documentcloud.github.com/docsplit/
While it's not Node, you could easily invoke it from Node with http://nodejs.org/docs/latest/api/all.html#child_process.exec
You can easily convert one into another, or use for example a .doc template to generate a .pdf file, but you will probably want to use an existing web service for this task.
This can be done using the services of Livedocx for example
To use this service from node, see node-livedocx (Disclaimer: I am the author of this node module)
I would suggest looking into unoconv for your initial conversion, this uses LibreOffice or OpenOffice for the actual conversion. Which adds some overhead.
I'd setup a few workers with all the necessities setup, and use a request/response queue for handling the conversion... (may want to look into kue or zmq)
In general this is a CPU bound and heavy task that should be offloaded... Pandoc and others specifically mention .docx
, not .doc
so they may or may not be options as well.
Note: I know this question is old, just wanted to provide a current answer for others coming across this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With