I have a PDF which is searchable and I need to convert it into a non-searchable one. I tried using Ghostscript and change it to JPEG and then back to PDF which does the trick but the file size is way too large and not acceptable. I tried using Ghostscript to convert the PDF to PS first and then PDF which does the trick as well but the quality is not good enough. <pre class="prettyprint"><code>gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pswrite -r1000 -sOutputFile=out.ps in.pdf gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=out.pdf out.ps </code></pre> Is there a way to give a good quality to the PDF? Alternatively is there an easier way to convert a searchable PDF to a non-searchable one?

You can use Ghostscript to achieve that. You need 2 steps: <ol> <li> Convert the PDF to a PostScript file, which has all used fonts converted to outline shapes. The key here is the <code>-dNOCACHE</code> paramenter: <pre class="prettyprint"> gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf</pre> </li> <li> Convert the PS back to PDF (and, maybe delete the intermediate PS again): <pre class="prettyprint"> gs -o somepdf-with-outlines.pdf -sDEVICE=pdfwrite somepdf.ps rm somepdf.ps</pre> </li> </ol> Note, that the resulting PDF will very likely be larger than the original one. (And, without additional command line parameters, all images in the original PDF will likely also be converted according to Ghostscript builtin defaults, unless you add more command line parameters to do otherwise. But the quality should be better than your own attempt to use Ghostscript...) <hr> <h3>Update</h3> Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter: <pre class="prettyprint"><code> -dNoOutputFonts </code></pre> which will cause the output devices <code>pdfwrite</code>, <code>ps2write</code> and <code>eps2write</code> "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)". This means that the above two steps can be avoided, and the desired result be achieved with a single command: <pre class="prettyprint"><code> gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf </code></pre> Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

a possible way to produce non-searchable vector pdf from a searchable vector pdf is <ol> <li> burst pdf in its single pages <code>pdftk file.pdf burst</code> </li> <li> convert any single page in svg with pdftocairo <ul> <li>http://poppler.freedesktop.org/</li> </ul> </li> </ol> contained into poppler utils <pre class="prettyprint"><code>for f in *.pdf; do pdftocairo -svg $f; done </code></pre> 3 . delete ALL pdf in folder 4 . then, with batikrasterizer <ul> <li>http://xmlgraphics.apache.org/batik/tools/rasterizer.html</li> </ul> re-convert ALL svg to pdf (this time the resulting pdfs will be kept vectorial, but without to be searchable) <pre class="prettyprint"><code>java -jar ./batik-rasterizer.jar -m application/pdf *.svg </code></pre> final step: join all resulting single page pd in one multipage pdf file <pre class="prettyprint"><code>pdftk *.pdf cat output out.pdf </code></pre>

Converting searchable PDF to a non-searchable PDF

Tags:

pdf

ghostscript

I have a PDF which is searchable and I need to convert it into a non-searchable one.

I tried using Ghostscript and change it to JPEG and then back to PDF which does the trick but the file size is way too large and not acceptable.

I tried using Ghostscript to convert the PDF to PS first and then PDF which does the trick as well but the quality is not good enough.

gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pswrite -r1000 -sOutputFile=out.ps in.pdf
gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=out.pdf out.ps

Is there a way to give a good quality to the PDF?

Alternatively is there an easier way to convert a searchable PDF to a non-searchable one?

306

asked Feb 02 '12 03:02

Steven Yong

2 Answers

You can use Ghostscript to achieve that. You need 2 steps:

Convert the PDF to a PostScript file, which has all used fonts converted to outline shapes. The key here is the -dNOCACHE paramenter:
```
gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
```

Convert the PS back to PDF (and, maybe delete the intermediate PS again):

gs -o somepdf-with-outlines.pdf -sDEVICE=pdfwrite somepdf.ps
rm somepdf.ps

Note, that the resulting PDF will very likely be larger than the original one. (And, without additional command line parameters, all images in the original PDF will likely also be converted according to Ghostscript builtin defaults, unless you add more command line parameters to do otherwise. But the quality should be better than your own attempt to use Ghostscript...)

Update

Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:

 -dNoOutputFonts

which will cause the output devices pdfwrite, ps2write and eps2write "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".

This means that the above two steps can be avoided, and the desired result be achieved with a single command:

 gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf

Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

answered Oct 08 '22 07:10

Kurt Pfeifle

a possible way to produce non-searchable vector pdf from a searchable vector pdf is

burst pdf in its single pages

pdftk file.pdf burst
convert any single page in svg with

pdftocairo
- http://poppler.freedesktop.org/

contained into poppler utils

for f in *.pdf; do pdftocairo -svg $f; done

3 . delete ALL pdf in folder

4 . then, with batikrasterizer

http://xmlgraphics.apache.org/batik/tools/rasterizer.html

re-convert ALL svg to pdf (this time the resulting pdfs will be kept vectorial, but without to be searchable)

java -jar ./batik-rasterizer.jar -m application/pdf *.svg

final step: join all resulting single page pd in one multipage pdf file

pdftk *.pdf cat output out.pdf

answered Oct 08 '22 06:10

Dingo

Related questions
                            
                                What are best parameters to run ImageMagick to convert low quality pdf to images (for OCR)
                            
                                Remove encryption from PDF file using Apache PDFBox
                            
                                how to add blank page in digitally signed pdf using java?
                            
                                Ghostscript: Quality and Size issue
                            
                                How to get the diff of two PDF files using Python?
                            
                                Java API for encrypting / decrypting pdf files
                            
                                Reliable way to (programmatically) compare PDFs? [duplicate]
                            
                                How can I convert a PNG file to PDF using java?
                            
                                TCPDF cant image because it is using a wrong directory path
                            
                                Java PDFBox setting custom font for a few fields in PDF Form
                            
                                Print a pdf created with jsPDF in all browsers
                            
                                How to create a PDF file with PHP? [duplicate]
                            
                                How to Insert a Linefeed with PDFBox drawString
                            
                                Test download of pdf with rspec and pdfkit
                            
                                phantomjs pdf to stdout
                            
                                imagemagick convert pdf with transparency to jpg
                            
                                What does "Not LTV-enabled" mean?
                            
                                Odoo: How to include a 'page break' / second page in custom report (pdf)?
                            
                                How to set the name of the file when streaming a Pdf in a browser?
                            
                                ItextSharp - AutoFill a pdf form using C# - Issues with Checkboxes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With