Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge Existing PDF into new ReportLab PDF via flowables

I have a reportlab SimpleDocTemplate and returning it as a dynamic PDF. I am generating it's content based on some Django model metadata. Here's my template setup:

buff = StringIO()
doc = SimpleDocTemplate(buff, pagesize=letter,
                        rightMargin=72,leftMargin=72,
                        topMargin=72,bottomMargin=18)
Story = []

I can easily add textual metadata from the Entry model into the Story list to be built later:

    ptext = '<font size=20>%s</font>' % entry.title.title()
    paragraph = Paragraph(ptext, custom_styles["Custom"])
    Story.append(paragraph)

And then generate the PDF to be returned in the response by calling build on the SimpleDocTemplate:

doc.build(Story, onFirstPage=entry_page_template, onLaterPages=entry_page_template)

pdf = buff.getvalue()
resp = HttpResponse(mimetype='application/x-download')    
resp['Content-Disposition'] = 'attachment;filename=logbook.pdf'
resp.write(pdf)
return resp

One metadata field on the model is a file attachment. When those file attachments are PDFs, I'd like to merge them into the Story that I am generating; IE meaning a PDF of reportlab "flowable" type.

I'm attempting to do so using pdfrw, but haven't had any luck. Ideally I'd love to just call:

from pdfrw import PdfReader
pdf = pPdfReader(entry.document.file.path)
Story.append(pdf)

and append the pdf to the existing Story list to be included in the generation of the final document, as noted above.

Anyone have any ideas? I tried something similar using pagexobj to create the pdf, trying to follow this example:

http://code.google.com/p/pdfrw/source/browse/trunk/examples/rl1/subset.py

from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl

pdf = pagexobj(PdfReader(entry.document.file.path))

But didn't have any luck either. Can someone explain to me the best way to merge an existing PDF file into a reportlab flowable? I'm no good with this stuff and have been banging my head on pdf-generation for days now. :) Any direction greatly appreciated!

like image 244
kyleturner Avatar asked Nov 13 '12 20:11

kyleturner


2 Answers

I just had a similar task in a project. I used reportlab (open source version) to generate pdf files and pyPDF to facilitate the merge. My requirements were slightly different in that I just needed one page from each attachment, but I'm sure this is probably close enough for you to get the general idea.

from pyPdf import PdfFileReader, PdfFileWriter

def create_merged_pdf(user):
    basepath = settings.MEDIA_ROOT + "/"
    # following block calls the function that uses reportlab to generate a pdf
    coversheet_path = basepath + "%s_%s_cover_%s.pdf" %(user.first_name, user.last_name, datetime.now().strftime("%f"))
    create_cover_sheet(coversheet_path, user, user.performancereview_set.all())

    # now user the cover sheet and all of the performance reviews to create a merged pdf
    merged_path = basepath + "%s_%s_merged_%s.pdf" %(user.first_name, user.last_name, datetime.now().strftime("%f"))

    # for merged file result
    output = PdfFileWriter()

    # for each pdf file to add, open in a PdfFileReader object and add page to output
    cover_pdf = PdfFileReader(file( coversheet_path, "rb"))
    output.addPage(cover_pdf.getPage(0))

    # iterate through attached files and merge.  I only needed the first page, YMMV
    for review in user.performancereview_set.all():
        review_pdf = PdfFileReader(file(review.pdf_file.file.name, "rb"))
        output.addPage(review_pdf.getPage(0)) # only first page of attachment

    # write out the merged file
    outputStream = file(merged_path, "wb")
    output.write(outputStream)
    outputStream.close()
like image 86
RyanBrady Avatar answered Nov 15 '22 05:11

RyanBrady


I used the following class to solve my issue. It inserts the PDFs as vector PDF images. It works great because I needed to have a table of contents. The flowable object allowed the built in TOC functionality to work like a charm.

Is there a matplotlib flowable for ReportLab?

Note: If you have multiple pages in the file, you have to modify the class slightly. The sample class is designed to just read the first page of the PDF.

like image 35
Greg Svitak Avatar answered Nov 15 '22 05:11

Greg Svitak