Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combine multiple pdfs in linux using script?

I want to save/download pdfs from X website and then combined all those pdfs into one, so that it is easy for me to see all of them at once.

What I did,

  1. get pdfs from website

    wget -r -l1 -A.pdf --no-parent http://linktoX

  2. combine pdfs into one

    gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=Combined_date +%F.pdf -dBATCH file1.pdf file2.pdf file3.pdf

My question/problem is, I thought of automating whole this in one script, so that I dont have to do this everyday. Here new pdfs are added daily in X.

So, how can I do step 2 above, without giving full list of all the pdfs, i tried doing file*.pdf in step2; but it combined all pdfs in random order.

Next problem is, total number of file*.pdf is not same everyday, sometimes 5 pdfs sometimes 10...but nice thing is it is named in order file1.pdf file2.pdf ...

So, I need some help to complete above step 2, such that all pdfs are combined in order and I dont have to give name of each pdf explicitly

Thanks.

UPDATE: This solved the problem

pdftk `ls -rt kanti*.pdf` cat output Kanti.pdf

I did ls -rt as file1.pdf was downloaded first, and then file2.pdf and so on...just doing ls -t put file20.pdf in the start and file1.pdf in last...

like image 589
seg.server.fault Avatar asked Aug 09 '09 19:08

seg.server.fault


2 Answers

I've also used pdftk in the past with good results.

For listing the files in numeric order, you can instruct sort to ignore the first $n - 1 characters of the filename by doing this:

ls | sort -n -k 1.$n

So if you had file*.pdf:

$ ls | sort -n -k 1.5
file1.pdf
file2.pdf
file3.pdf
file4.pdf
file10.pdf
file11.pdf
file20.pdf
file21.pdf
like image 81
A B Avatar answered Oct 12 '22 23:10

A B


I have used pdftk before for such concatenations as pdftk happens to be readily available to Debian / Ubuntu.

like image 44
Dirk Eddelbuettel Avatar answered Oct 12 '22 23:10

Dirk Eddelbuettel