Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.doc to pdf using python

I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).

Right now I'm reading through this. Dont know how useful this is going to be.

like image 748
nik Avatar asked May 15 '11 20:05

nik


People also ask

Can Python produce a PDF?

Fortunately, the Python ecosystem has some great packages for reading, manipulating, and creating PDF files. In this tutorial, you'll learn how to: Read text from a PDF.

How do I read a .DOC file in Python?

You can use python-docx2txt library to read text from Microsoft Word documents. It is an improvement over python-docx library as it can, in addition, extract text from links, headers and footers. It can even extract images. You can install it by running: pip install docx2txt .

How do I print to PDF in Python?

You can use the Adobe Reader command line print option. If you leave the printer name blank, reader will print to the default printer. You can use the Adobe Reader command line print option.


1 Answers

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

import sys import os import comtypes.client  wdFormatPDF = 17  in_file = os.path.abspath(sys.argv[1]) out_file = os.path.abspath(sys.argv[2])  word = comtypes.client.CreateObject('Word.Application') doc = word.Documents.Open(in_file) doc.SaveAs(out_file, FileFormat=wdFormatPDF) doc.Close() word.Quit() 

You could also use pywin32, which would be the same except for:

import win32com.client 

and then:

word = win32com.client.Dispatch('Word.Application') 
like image 56
Steven Avatar answered Sep 20 '22 03:09

Steven