I've been grading academic papers for a couple of years now and I've started to see numerous patterns in spelling and grammer mistakes. Also, I've noticed that less experienced academics tend to use certain constructs that immediately raise "smells" to more experienced researchers.
I would like to automagically recognize and annotate these in PDF files. Is anyone aware of a script that I could use to automagically annotate and comment PDF files? Perhaps it's dead simple, but I feel like I'm one of the first ones to ask this question.
Programming is no problem.
Tip of the Week: Add inline or margin comments to your PDF In LaTeX, you can use the % (percent sign) to comment out a line of text in your source code. If you'd like to include comments that appear in the PDF of your project, you can use the todonotes package.
First, use the Notes button in the top-right corner to open the Notes pane, where you can create a new note or open an existing note. To create a new note from all annotations in the current PDF, click one of the “+” buttons and select Add Item Note from Annotations or Add Standalone Note from Annotations.
Start Adobe® Acrobat® and select “Plug-ins > Auto-Rename PDF Files…” from the main Acrobat menu. Press “Add…” to add a new component to the output file name. Select the “Text From Location” option and click “Next” to enter parameters. Click “OK” in the dialog box to proceed.
To solve this task, you need 3 things:
PDFlib's TET (text extraction toolkit) lets you extract text from any PDF. It's the most powerful of available PDF text extraction tools out there that allows you access via commandline and scripting. It can handle such weirdies (from the p.o.v. of text extraction) as ligatures as well as different text encodings. More important, it can tell you the exact page number and coordinates on the PDF page for any character or text string it extracted.
After you parsed the text, and your logic decided which comment to add for which page, you can use PDFlib or Ghostscript to add comments ("annotations") to the original PDF.
I'm not delivering a tutorial about how to use PDFlib in order to add annotations to existing PDFs here. But I will leak some insider knowledge about how Ghostscript can do it:
To add an annotation with Ghostscript to an existing PDF, first create a text file called my-pdfmarks.txt (or whatever name you prefer). Now type into that textfile the content of your annotation, using the following syntax:
[ /Title (Annotation experiments by -pipitas-)
/Author (pipitas)
/Subject (I'm trying to add annotations to existing PDFs with the help of Ghostscript...)
/Keywords (comma, separated, keywords, spelling mistakes, grammar mistakes, raising "smells")
/ModDate (D:20101219192842)
/CreationDate (D:20101219092842)
/Creator (pipitas' brainz)
/Producer (Ghostscript under the direction of pipitas)
/DOCINFO pdfmark
[ /Contents (Smell: This statement was bloody well rebutted by decades of academic research...)
/Rect [10 10 50 50]
/Subtype /Text
/Name Note
/SrcPg 2
/Open true
/ModDate (D:20101220193344)
/Title (A Comment on Page 2)
/Color [.5 .5 0]
/ANN pdfmark
Then, run Ghostscript command like the following. I'm assuming Windows now -- for Linux/Unix/MacOSX use gs
instead of gswin32c.exe
for the executable, and use \
instead of ^
for the line continuation marks:
gs ^
-o original-annotated.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/prepress ^
original.pdf ^
my-pdfmarks.txt
Voila! Your output PDF now has an annotation on page 2.
Now you probably didn't understand what exactly you were doing:
Tweakable parameter values (after each keyword) in the my-annotations.txt file are all BUT the following:
/DOCINFO pdfmark
"/Subtype /Text
"/Name /Note
"/ANN pdfmark
"For example, to make the annotation appear in pure red, use /Color [1 0 0]
.
In order to fully understand the pdfmark syntax (and add more tweaks to your procedure), you'll need to google for Adobe's pdfmark Reference Manual and read that.
Since you said 'programming is no problem' you now have all the building blocks to automate this with any scripting language of your choice.
If I were you I would start with the PDF Library SDK which supports the things you're looking for:
One drawback is that you have to apply for it and Adobe may refuse your request.
EDIT:
PDFedit seems promising. It's an open source GUI application that allows you to modify PDF manually or by scripting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With