Converting docx to pdf with pure python (on linux, without libreoffice)

Tags:

I'm dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere - linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far:

import subprocess

try:
    from comtypes import client
except ImportError:
    client = None

def doc2pdf(doc):
    """
    convert a doc/docx document to pdf format
    :param doc: path to document
    """
    doc = os.path.abspath(doc) # bugfix - searching files in windows/system32
    if client is None:
        return doc2pdf_linux(doc)
    name, ext = os.path.splitext(doc)
    try:
        word = client.CreateObject('Word.Application')
        worddoc = word.Documents.Open(doc)
        worddoc.SaveAs(name + '.pdf', FileFormat=17)
    except Exception:
        raise
    finally:
        worddoc.Close()
        word.Quit()


def doc2pdf_linux(doc):
    """
    convert a doc/docx document to pdf format (linux only, requires libreoffice)
    :param doc: path to document
    """
    cmd = 'libreoffice --convert-to pdf'.split() + [doc]
    p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    p.wait(timeout=10)
    stdout, stderr = p.communicate()
    if stderr:
        raise subprocess.SubprocessError(stderr)

As you can see, one method requires comtypes, another requires libreoffice as a subprocess. Other than switching to a more sophisticated hosting server, is there any solution?

215

asked Jun 22 '18 06:06

Ofer Sadan

Video Answer

2 Answers

The PythonAnywhere help pages offer information on working with PDF files here: https://help.pythonanywhere.com/pages/PDF

Summary: PythonAnywhere has a number of Python packages for PDF manipulation installed, and one of them may do what you want. However, shelling out to abiword seems easiest to me. The shell command abiword --to=pdf filetoconvert.docx will convert the docx file to a PDF and produce a file named filetoconvert.pdf in the same directory as the docx. Note that this command will output an error message to the standard error stream complaining about XDG_RUNTIME_DIR (or at least it did for me), but it still works, and the error message can be ignored.

176

answered Sep 17 '22 14:09

jcgoble3

Another one you could use is libreoffice, however as the first responder said the quality will never be as good as using the actual comtypes.

anyways, after you have installed libreoffice, here is the code to do it.

from subprocess import  Popen
LIBRE_OFFICE = r"C:\Program Files\LibreOffice\program\soffice.exe"

def convert_to_pdf(input_docx, out_folder):
    p = Popen([LIBRE_OFFICE, '--headless', '--convert-to', 'pdf', '--outdir',
               out_folder, input_docx])
    print([LIBRE_OFFICE, '--convert-to', 'pdf', input_docx])
    p.communicate()


sample_doc = 'file.docx'
out_folder = 'some_folder'
convert_to_pdf(sample_doc, out_folder)

answered Sep 16 '22 14:09

dfresh22

Related questions
                            
                                Handling exceptions in Kafka streams
                            
                                How do you fetch weather data from the Google Maps Platform?
                            
                                How to write a custom derive macro?
                            
                                React 16.7 Hooks : `react.useState` is not a function
                            
                                Cannot read property 'whitelist' of undefined when importing ngrx dev tools
                            
                                How to inject JavaScript callback to detect onclick event, using iOS WKWebView?
                            
                                Group and aggregate a list of dictionaries by multiple keys
                            
                                Let's Encrypt kubernetes Ingress Controller issuing Fake Certificate
                            
                                Why is Chrome coloring my cookie red when I set it from the inspector?
                            
                                Android Studio 3.3 folder icons
                            
                                Differences between and when to use Map vs Record
                            
                                How to restore a mock created with jest.mock()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With