Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?
P.S. This question is inspired by this answer which explains how it can be done with GhostScript.
Drag and drop a PDF file, then extract pages from it. Select a file to extract pages from. Select a file to extract pages from.
In Acrobat Distiller, you can select settings used to convert documents to PDFs, security options, and font information. You also use the Acrobat Distiller window to monitor the jobs you've lined up for PDF conversion.
what is difference between Acrobat Distiller and Acrobat DC? Distiller can create PDF files from PS (PostScript) files. Acrobat can create PDF files from any file format, as well as edit them. Thank you for your explanation.
PDF files are usually already compressed. In particular images that are part of the PDF files are nearly always already compressed. You can't compress something that is already compressed again (or infinite compression would be possible, just run the compression program multiple times, until it gets small enough).
qpdf
and pdftk
have already been mentioned. To show the commands:
$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress
mutool however hasn't been mentioned yet:
$ mutool clean -d -a orig.pdf uncompressed-orig.pdf
mutool
is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.
I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).
Use cpdf:
cpdf -decompress in.pdf -o out.pdf
and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.
Disclosure: I am the author of cpdf.
This is easy with qpdf and pdftk.
With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With