I want to remove metadata from PDF files. I have already tried to use "exiftool", "pdftk" and "qpdf" to remove the metadata (method proposed - https://gist.github.com/hubgit/6078384 ). These tools claim to remove metadata but unfortunately retain them. I used "grep -a metadata_fieldname file.pdf" option and I could retrieve the metadata value.
Is there a way to completely delete the metadata information from PDF files (delete all the objects containing metadata information).
I am using Ubuntu. When I create a PDF file using LaTeX tool (ex- pdfTeX) or LibreOffice, the tool automatically writes the information of Producer, Creator and sometimes Full banner etc.. in the metadata of the PDF file. So I am looking to remove this information from PDF files (basically the metadata information stored by the PDF creator tool).
To remove all pdf information dictionary using pdftk on your ubuntu terminal, you can use the following commands:
pdftk file.pdf dump_data |sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | pdftk file.pdf update_info - output file_no_meta.pdf
Assuming file.pdf is the source file and your pdf file output as file_no_meta.pdf
Next, use the following command to remove XMP metadata:
exiftool -all:all= -overwrite_original file_no_meta.pdf
Finally, use the following command on your terminal to check for the file metadata again:
pdfinfo file_no_meta.pdf
You can use pdftk to strip all Info and XMP metadata from a document by copying its pages into a new PDF, like this:
pdftk A=mydoc.pdf cat A output mydoc.no_metadata.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With