Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An efficient way to convert document to pdf format

Tags:

I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter, but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a better way or suggestions to improve my approach?

What i have tried:

from subprocess import Popen, PIPE import time  def convert(src, dst):     d = {'src': src, 'dst': dst}     commands = [         '/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d,         'oowriter --headless -convert-to pdf:writer_pdf_Export %(dst)s %(src)s' % d,     ]      for i in range(len(commands)):         command = commands[i]         st = time.time()         process = Popen(command, stdout=PIPE, stderr=PIPE, shell=True) # I am aware of consequences of using `shell=True`          out, err = process.communicate()         errcode = process.returncode         if errcode != 0:             raise Exception(err)         en = time.time() - st         print 'Command %s: Completed in %s seconds' % (str(i+1), str(round(en, 2)))  if __name__ == '__main__':     src = '/path/to/source/file/'     dst = '/path/to/destination/folder/'     convert(src, dst) 

Output:

Command 1: Completed in 11.91 seconds Command 2: Completed in 11.55 seconds 

Environment:

  • Linux - Ubuntu 12.04
  • Python 2.7.3

More tools result:

  • jodconverter took 11.32 seconds
like image 309
Aamir Rind Avatar asked Jan 02 '14 21:01

Aamir Rind


People also ask

How can I convert a Word to PDF without losing formatting?

The latest versions (after MS Office 2007) allow you to save the document as a pdf, thus avoiding formatting errors. Go to Files->Save As and select ". pdf format" from Save As Type. Click to save.

How do I batch convert multiple Word documents to PDF?

Batch Convert Word to PDF with Adobe Acrobat. Step 1: Save all the Word documents that you wish to convert in one folder. Step 2: Open Adobe Acrobat and select 'Create PDF' to begin the batch convert Word to PDF progress. Step 3: Choose 'Multiple Files' > 'Create Multiple PDF Files'.


2 Answers

Try calling unoconv from your Python code, it took 8 seconds on my local machine, I don't know if it's fast enough for you:

time unoconv 15.\ Text-Files.pptx real    0m8.604s 
like image 96
avenet Avatar answered Sep 29 '22 16:09

avenet


Pandoc is a wonderful tool capable of doing what you'd like quickly. Since you're using Popen to effectively shell out the command for the tool, it doesn't matter what language the tool is written in (Pandoc is written in Haskell).

like image 44
jeffknupp Avatar answered Sep 29 '22 17:09

jeffknupp