First: I am aware of the general comment: Do not track generated files.
Say, I want to track generated PDFs and have git ignore the date written into the PDF. That means, I want git to treat two PDFs as the same if the only difference is the the Date information.
What I tried is a filter that -- in its clean part -- sets the date to some arbitrary value.
(--- comment ----
basically, the filter does sth along:
## dump the pdf metadata to a file and replace the dates
pdftk "$FILENAME" dump_data | sed -e '{N;s/Date\nInfoValue: D:.*/Date\nInfoValue: D:19790101072619/}' > "$TMPFILE"
## update the pdf metadata
pdftk "$FILENAME" update_info "$TMPFILE" output "$TMPFILE2"
) --- end comment ----
The filter works (the committed pdf has the date set to my arbitrary value) but I ran into files re-checked out from git repository with 'clean' filter end up with modified status
So, my filter is apparently not what I want to do here.
My question is:
1) Can I use a clever filter approach to get git ignore the date values in the PDF completely? And how?
or
2) What would be the correct approach if not filters?
Finally solved this with the help from the git mailing list. Wasn't a git issue after all, but more a problem of my filters expectations of pdftk. (Maybe an encoding thing? Did not dig deeper.)
The helpful message on the git mailing list is here: http://permalink.gmane.org/gmane.comp.version-control.git/224797
Basically, the filter script I wrote was not idem-potent, meaning that applying the clean filter a second time to a cleaned file would change the file.
Background: When pdftk is used to update the metadata of a pdf with the metadate it extracted from that exact pdf before, to my surprise it changes the pdf file.
So, I included a safety check into my filter and the issue has gone away.
For reference, here is the full filter:
#!/bin/bash
## use GNU coreutils on OS X explicitely
## (install via homebrew, for instance:
## > brew install coreutils
## > brew install gnu-sed
## )
if [ ${OSTYPE:0:6} == "darwin" ]; then
MKTMP=gmktemp
SED=gsed
else
MKTMP=mktemp
SED=sed
fi
FILEASARG=true
if [ "$#" == 0 ]; then
FILEASARG=false
fi
if $FILEASARG ; then
FILENAME="$1"
else
FILENAME=`$MKTMP`
cat /dev/stdin > "${FILENAME}"
fi
TMPFILE=`$MKTMP`
TMPFILE2=`$MKTMP`
TMPFILE3=`$MKTMP`
## dump the pdf metadata to a file and replace the dates
pdftk "$FILENAME" dump_data > "$TMPFILE3"
$SED -e '/Date/{ N; s/Date\nInfoValue: D:.*/Date\nInfoValue: D:19790101072619/ }' < "$TMPFILE3" > "$TMPFILE"
## if the metadata did not change, do nothing
if diff "$TMPFILE3" "$TMPFILE"; then
rm "$TMPFILE3"
rm "$TMPFILE"
if [ -n $FILEASARG ] ; then
cat "$FILENAME"
fi
exit 0
fi
## update the pdf metadata
pdftk "$FILENAME" update_info "$TMPFILE" output "$TMPFILE2"
## overwrite the original pdf
mv -f "$TMPFILE2" "$FILENAME"
## clean up
rm -f "$TMPFILE"
rm -f "$TMPFILE2"
if [ -n $FILEASARG ] ; then
cat "$FILENAME"
fi
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With