I would like to implement an in-browser Microsoft Word document merge feature that will convert the merged document into PDF and offer it to the user for download. I would like to this process to be supported in Google Chrome and Firefox. Here is how I would like it to work:
- Client-side JavaScript obtains the Word template document in docx format, either from a server, or by asking the user for a file upload (which it can then read using the FileReader API)
- The JavaScript uses its local data structures (e.g., data lists it has obtained via Ajax) to expand the template into a document. It can do this either directly, by unzipping the docx file and processing its contents, or using DOCx.js. The template expansion is just a matter of substituting template variables with values obtained from the local data structures.
- The JavaScript then converts the expanded template into PDF.
- The JavaScript offers the PDF file to the user for download, e.g., using Downloadify.
The difficulty I am having is in step 3. My understanding (based on all the Googling I have done so far) is that I have the following options:
- Require that the local machine is a Windows machine, and invoke Word on it, to convert to PDF. This can be done using a little bit of scripting using WScript.shell, and it looks doable with Internet Explorer. But based on what I have read, it doesn't look like I can call WScript.shell from within either Chrome or Firefox, because of their security constraints.
- I am open to trying Silverlight to do the conversion, but I have not found enough documentation on how to do this. Ideally, if I used Silverlight, I would like to write the Silverlight code in JavaScript, because (a) I don't know much CSharp, and (b) I think it would be much easier in JavaScript.
- Create a web service that will convert a given docx file to a pdf file, and invoke that service via Ajax. I would rather not do this, if possible, for a few reasons: (a) I tried using docx4java (I am a reasonably skilled Java programmer) but the conversion process is far too slow, and it does not preserve document content very well; and (b) I would like to avoid a call out to the network, to avoid security issues. It does seem possible to write a little service on a Windows server for doing the conversion, and if there is no other good option, I might go that route.
If I have been unclear about anything, please let me know. I would appreciate your ideas and feedback.
I love command line tools.
Load the doc to your server and use LibreOffice to convert it to PDF via the command line
soffice.exe --headless --convert-to pdf --outdir E:\Docs\Out E:\Docs\In\a.doc
You can display a progress bar to the user and when complete give them the option to download the doc.
More info on LibreOffice's command line parameters go here
Done.