Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove PDF metadata (removing complete PDF metadata )

Tags:

pdf

metadata

I want to remove metadata from PDF files. I have already tried to use "exiftool", "pdftk" and "qpdf" to remove the metadata (method proposed - https://gist.github.com/hubgit/6078384 ). These tools claim to remove metadata but unfortunately retain them. I used "grep -a metadata_fieldname file.pdf" option and I could retrieve the metadata value.

Is there a way to completely delete the metadata information from PDF files (delete all the objects containing metadata information).

I am using Ubuntu. When I create a PDF file using LaTeX tool (ex- pdfTeX) or LibreOffice, the tool automatically writes the information of Producer, Creator and sometimes Full banner etc.. in the metadata of the PDF file. So I am looking to remove this information from PDF files (basically the metadata information stored by the PDF creator tool).

like image 967
y11 Avatar asked Mar 18 '20 11:03

y11


2 Answers

To remove all pdf information dictionary using pdftk on your ubuntu terminal, you can use the following commands:

pdftk file.pdf  dump_data |sed -e 's/\(InfoValue:\)\s.*/\1\ /g' | pdftk file.pdf update_info - output file_no_meta.pdf

Assuming file.pdf is the source file and your pdf file output as file_no_meta.pdf

Next, use the following command to remove XMP metadata:

exiftool -all:all= -overwrite_original file_no_meta.pdf

Finally, use the following command on your terminal to check for the file metadata again:

pdfinfo file_no_meta.pdf
like image 146
teemran Avatar answered Oct 17 '22 18:10

teemran


You can use pdftk to strip all Info and XMP metadata from a document by copying its pages into a new PDF, like this:

pdftk A=mydoc.pdf cat A output mydoc.no_metadata.pdf
like image 2
linuxhelp Avatar answered Oct 17 '22 17:10

linuxhelp