I want to save/download pdfs from X website and then combined all those pdfs into one, so that it is easy for me to see all of them at once.
What I did,
get pdfs from website
wget -r -l1 -A.pdf --no-parent http://linktoX
combine pdfs into one
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=Combined_date +%F
.pdf -dBATCH file1.pdf file2.pdf file3.pdf
My question/problem is, I thought of automating whole this in one script, so that I dont have to do this everyday. Here new pdfs are added daily in X.
So, how can I do step 2 above, without giving full list of all the pdfs, i tried doing file*.pdf
in step2; but it combined all pdfs in random order.
Next problem is, total number of file*.pdf is not same everyday, sometimes 5 pdfs sometimes 10...but nice thing is it is named in order file1.pdf file2.pdf ...
So, I need some help to complete above step 2, such that all pdfs are combined in order and I dont have to give name of each pdf explicitly
Thanks.
UPDATE: This solved the problem
pdftk `ls -rt kanti*.pdf` cat output Kanti.pdf
I did ls -rt as file1.pdf was downloaded first, and then file2.pdf and so on...just doing ls -t put file20.pdf in the start and file1.pdf in last...
I've also used pdftk in the past with good results.
For listing the files in numeric order, you can instruct sort to ignore the first $n - 1 characters of the filename by doing this:
ls | sort -n -k 1.$n
So if you had file*.pdf:
$ ls | sort -n -k 1.5
file1.pdf
file2.pdf
file3.pdf
file4.pdf
file10.pdf
file11.pdf
file20.pdf
file21.pdf
I have used pdftk before for such concatenations as pdftk happens to be readily available to Debian / Ubuntu.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With