Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse annotations from a pdf

Tags:

python

pdf

I want a python function that takes a pdf and returns a list of the text of the note annotations in the document. I have looked at python-poppler (https://code.launchpad.net/~poppler-python/poppler-python/trunk) but I can not figure out how to get it to give me anything useful.

I found the get_annot_mapping method and modified the demo program provided to call it via self.current_page.get_annot_mapping(), but I have no idea what to do with an AnnotMapping object. It seems to not be fully implemented, providing only the copy method.

If there are any other libraries that provide this function, that's fine as well.

like image 431
davidb Avatar asked Jul 09 '09 19:07

davidb


People also ask

Can you print annotations in PDF?

Choose File > Print. In the Print dialog box, click the Summarize Comments button. When prompted "Do you want to include the text of summarized comments...," click Yes.


1 Answers

Just in case somebody is looking for some working code. Here is a script I use.

import poppler import sys import urllib import os  def main():   input_filename = sys.argv[1]     # http://blog.hartwork.org/?p=612   document = poppler.document_new_from_file('file://%s' % \     urllib.pathname2url(os.path.abspath(input_filename)), None)   n_pages = document.get_n_pages()   all_annots = 0    for i in range(n_pages):         page = document.get_page(i)         annot_mappings = page.get_annot_mapping ()         num_annots = len(annot_mappings)         if num_annots > 0:             for annot_mapping in annot_mappings:                 if  annot_mapping.annot.get_annot_type().value_name != 'POPPLER_ANNOT_LINK':                     all_annots += 1                     print('page: {0:3}, {1:10}, type: {2:10}, content: {3}'.format(i+1, annot_mapping.annot.get_modified(), annot_mapping.annot.get_annot_type().value_nick, annot_mapping.annot.get_contents()))        if all_annots > 0:     print(str(all_annots) + " annotation(s) found")   else:     print("no annotations found")  if __name__ == "__main__":     main() 
like image 64
Enno Gröper Avatar answered Oct 02 '22 12:10

Enno Gröper