Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Statically compile pdftk for Heroku. Need to split PDF into single page files

So we're using heroku to host our rails application. We've moved to the cedar stack. This stack does not have the pdftk library installed. I contacted support and was told to statically compile it for amd64 ubuntu and include it in my application.

This has proved more difficult than I thought. Initially I downloaded the package for ubuntu (http://packages.ubuntu.com/natty/pdftk), extracted it, and included the binary file as well as the shared libraries. I'm getting strange errors like:

Unhandled Java Exception:
java.lang.NullPointerException
   at com.lowagie.text.pdf.PdfCopy.copyIndirect(pdftk)
   at com.lowagie.text.pdf.PdfCopy.copyObject(pdftk)
   at com.lowagie.text.pdf.PdfCopy.copyDictionary(pdftk)

I'm assuming this is because some of the dependencies aren't installed?

So here are my questions:

  1. Is there an easier way to statically compile a library? Or do I need to move over its binary file as well as all of its libraries and dependencies?
  2. I'm just trying to split a multi-page PDF into single page files in ruby. Is there a way to do this without PDFTK? Or am I stuck with trying to statically compile PDFTK?

Thanks for the help, I know this isn't an easy problem, but would really appreciate help with this one. I've wasted close to 6 hours trying to get this damn thing to work.

like image 297
Binary Logic Avatar asked Aug 20 '11 01:08

Binary Logic


4 Answers

Unfortunately Heroku keeps stripping out magic to add flexibility. As a result it feels more and more like the days when I used to manage and maintain my own servers. There is no easy solution. My "monkey patch" is to send the file to a server that I can install PDFTK, process the file, and send it back. Not great, but it works. Having to deal with this defeats the purpose of using heroku.

like image 56
Binary Logic Avatar answered Oct 02 '22 19:10

Binary Logic


The easy solution is to add the one dependency for pdftk that is not found on heroku.

$ldd pdftk
    linux-vdso.so.1 =>  (0x00007ffff43ca000)
    libgcj.so.10 => not found
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1d26d48000)
    libm.so.6 => /lib/libm.so.6 (0x00007f1d26ac4000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1d268ad000)
    libc.so.6 => /lib/libc.so.6 (0x00007f1d2652a000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1d2630c000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1d27064000)

I put pdftk and libgcj.so.10 into the /bin directory of my app. You then just need to tell heroku to look at the /bin dir when loading libs.

You can type

$heroku config
LD_LIBRARY_PATH:             /app/.heroku/vendor/lib
LIBRARY_PATH:                /app/.heroku/vendor/lib

To see what your current LD_LIBRARY_PATH is set to and then add /app/bin (or whatever dir you chose to store libgcj.so.10) to it.

$heroku config:set LD_LIBRARY_PATH=/app/.heroku/vendor/lib:/app/bin

The down side is that my slug size went from 15.9MB to 27.5MB

like image 41
Dan G Avatar answered Oct 02 '22 19:10

Dan G


We've encountered the same problem, the solution we came up with was to use Stapler instead https://github.com/hellerbarde/stapler, it's a python utility and only requires an extra module to be installed (pyPdf) on Heroku.

I've been oriented to this blog entry: http://theprogrammingbutler.com/blog/archives/2011/07/28/running-pdftotext-on-heroku/

Here are the steps I followed to install pyPdf:

Accessing the heroku bash console

heroku run bash

Installing the latest version of pyPdf

cd tmp
curl http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz -o pyPdf-1.13.tar.gz
tar zxvf pyPdf-1.13.tar.gz
python setup.py install --user

This puts all the necessary files under a .local file at the root of the app. I just downloaded it and added it to our git repo, as well as the stapler utility. Finally I updated my code to use stapler instead of pdftk, et voilà! Splitting PDFs from Heroku again.

Another way, probably cleaner, would be to encapsulate it in a gem ( http://news.ycombinator.com/item?id=2816783 )

like image 28
Bastien Avatar answered Oct 02 '22 20:10

Bastien


I read a similar question on SO, and found this approach by Ryan Daigle that worked for me as well: instead of building local binaries that are hard to match to Heroku's servers, use the remote environment to compile and build the required dependencies. This is accomplished using the Vulcan gem, which is provided by Heroku.

Ryan's article "Building Dependency Binaries for Heroku Applications"

Another approach by Jon Magic (untested by me), is to download and compile the dependency directly through Heroku's bash, e.g. directly on the server: "Compiling Executables on Heroku".

On a side note, both approaches are going to result in binaries that are going to break if Heroku's underlying environment changes enough.

like image 27
Arman H Avatar answered Oct 02 '22 19:10

Arman H