Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make pdf generation faster with clj-pdf?

Tags:

clojure

I am using clj-pdf library to generate a page with 40000 pages with same two images on each page. Its taking me around 1 minute 30 seconds to generate the pdf which is quite too much considering we used to do it faster with python. What all can I do make it faster ?

Here is the stacktrace.

user=> (defn 
  #_=>  gen-pdf
  #_=>  []
  #_=>  (println (new java.util.Date))
  #_=>  (pdf [{}  (for [i (range 80000)] (do  [:paragraph [:image "sample_logos/batman.jpeg"] [:image "sample_logos/superman.jpeg"] ] ) )] "super.pdf")
  #_=>  (println (new java.util.Date)))
#'user/gen-pdf
user=> (gen-pdf)
#inst "2013-12-26T07:03:05.695-00:00"
#inst "2013-12-26T07:04:23.175-00:00"
nil
user=> 
like image 447
Amogh Talpallikar Avatar asked Dec 26 '13 07:12

Amogh Talpallikar


1 Answers

UPDATE: Author of clj-pdf was so kind to add references to library. Here is updated code using "1.11.9" version of clj-pdf:

(defn gen-pdf []
  (time
   (pdf [{:references {:batman [:image "sample_logos/batman.jpeg"]
                       :superman [:image "sample_logos/superman.jpeg"]}}
         (for [i (range 80000)]
           [:paragraph
            [:reference :batman]
            [:reference :superman]])]
        "super.pdf")))

which finish in 12 seconds on my machine.


I ran your example using [clj-pdf "1.11.7"], it took about 68 seconds and generated 5.4Gb file.

Then I created a python sample:

from reportlab.pdfgen import canvas
from datetime import datetime

batman = "sample_logos/batman.jpeg"
superman = "sample_logos/superman.jpeg"
n = 80000

def hello(c):
    for i in range(0, n):
        c.drawImage(batman, 0,0)
        c.showPage()
    for i in range(0, n):
        c.drawImage(superman, 0,0)
        c.showPage()

t1 = datetime.now()
c = canvas.Canvas("super_py.pdf")
hello(c)
c.save()
t2 = datetime.now()

print (t2 - t1)

It is roughly equivalent, using python 2.7.5+ and reportlab 2.7 it took 53 seconds and generated 108Mb file.

Reportlab reuse the same image so I changed clj-pdf to allow to pass iText Image in :image tag - see https://github.com/yogthos/clj-pdf/blob/master/src/clj_pdf/core.clj#L461

I have added another condition to pass Image instances as-is:

(let [img (cond
            (instance? Image img-data)
            img-data
            (instance? java.awt.Image img-data)
            (Image/getInstance (.createImage ...

and changed code to

(defn gen-pdf []
  (let [batman (Image/getInstance "sample_logos/batman.jpeg")
        superman (Image/getInstance "sample_logos/superman.jpeg")]
    (time
     (pdf [{}
           (for [i (range 80000)]
             [:paragraph
              ;; [:image "sample_logos/batman.jpeg"]
              ;; [:image "sample_logos/superman.jpeg"]
              [:image batman]
              [:image superman]])]
          "super.pdf"))))

This optimization allowed me to generate pdf in 17 seconds and 70 Mb

like image 190
edbond Avatar answered Nov 19 '22 13:11

edbond