I have a multi-page PDF with photographed book pages. I want to remove gradients from every page to prepare for optical character recognition.
This command works fine on a PNG of a single page:
convert page.png \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate page_deblurred.png
However, as soon as I try this on a multi-page PDF by using this command...
convert full.pdf \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate full_deblurred.pdf
...I get a single-page PDF with inversed colors overlaid with text from several pages.
How do I tell imagemagick to process every page like it does with the PNG and return a multi-page PDF to me?
As imagemagick does not seem to be capable to do this in one shot, I put together a script based on the suggestion Mark Setchell made in a comment to his answer.
#!/usr/bin/bash
set -e
tmpdir=$(mktemp -d)
echo "Splitting PDF into single pages"
convert -density 288 "$1" "${tmpdir}/page-%03d.png"
for f in "$tmpdir"/page-*.png
do
echo "Processing ${f##*/}"
convert "$f" \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate "$(printf "%s%s" "$f" "_gradient_removed.png")"
done
pdf_file_name_without_suffix="${1%.pdf}"
echo "Reassembling PDF"
convert "$tmpdir"/*_gradient_removed.png -quality 100 "$pdf_file_name_without_suffix"_gradient_removed.pdf
rm -rf "${tmpdir}"
It works fine with my material. Your mileage may vary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With