Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert DOCX Bytestream to PDF Bytestream Python

I currently have a program that generates a .docx document using the python-docx library.

Upon completing the building of the .docx file I save it into a Bytestream as so

file_stream = io.BytesIO()
document.save(file_stream)
file_stream.seek(0)

Now, I need to convert this word document into a PDF. I have looked at a few different libraries for conversion such as docx2pdf or even doing it manually using comtypes as so

import sys
import os
import comtypes.client

wdFormatPDF = 17

in_file = "Input_file_path.docx"
out_file = "output_file_path.pdf"

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

The problem is, I need to do this conversion in memory and cannot physically save the DOCX or the PDF to the machine. Every converter I've seen requires a filepath to the physical document on the machine and I do not have that.

Is there a way I can convert the DOCX filestream into a PDF stream just in memory?

Thanks

like image 600
johnstoia Avatar asked Nov 16 '22 05:11

johnstoia


1 Answers

This method is a little convoluted, but it works entirely in memory, and you get the option to add custom CSS to style the final document.
Convert the DOCX bytestream to HTML using mammoth, and the resulting HTML to PDF using pdfkit.

Here's an example

# create a dummy docx file
from docx import Document
document = Document()
document.add_paragraph('Lorem ipsum dolor sit amet.')

# create a bytestream
import io
file_stream = io.BytesIO()
document.save(file_stream)
file_stream.seek(0)

# convert the docx to html
import mammoth
result = mammoth.convert_to_html(file_stream)

# >>> result.value
# >>> '<p>Lorem ipsum dolor sit amet.</p>'

# convert html to pdf
import pdfkit
pdf = pdfkit.from_string(result.value)

If you want to output the stream to a file, just do

with open('test.pdf','wb') as file:
    file.write(pdf)
like image 167
Matias Agelvis Avatar answered Dec 06 '22 23:12

Matias Agelvis