I want a python function that takes a pdf and returns a list of the text of the note annotations in the document. I have looked at python-poppler (https://code.launchpad.net/~poppler-python/poppler-python/trunk) but I can not figure out how to get it to give me anything useful.
I found the get_annot_mapping
method and modified the demo program provided to call it via self.current_page.get_annot_mapping()
, but I have no idea what to do with an AnnotMapping object. It seems to not be fully implemented, providing only the copy method.
If there are any other libraries that provide this function, that's fine as well.
Choose File > Print. In the Print dialog box, click the Summarize Comments button. When prompted "Do you want to include the text of summarized comments...," click Yes.
Just in case somebody is looking for some working code. Here is a script I use.
import poppler import sys import urllib import os def main(): input_filename = sys.argv[1] # http://blog.hartwork.org/?p=612 document = poppler.document_new_from_file('file://%s' % \ urllib.pathname2url(os.path.abspath(input_filename)), None) n_pages = document.get_n_pages() all_annots = 0 for i in range(n_pages): page = document.get_page(i) annot_mappings = page.get_annot_mapping () num_annots = len(annot_mappings) if num_annots > 0: for annot_mapping in annot_mappings: if annot_mapping.annot.get_annot_type().value_name != 'POPPLER_ANNOT_LINK': all_annots += 1 print('page: {0:3}, {1:10}, type: {2:10}, content: {3}'.format(i+1, annot_mapping.annot.get_modified(), annot_mapping.annot.get_annot_type().value_nick, annot_mapping.annot.get_contents())) if all_annots > 0: print(str(all_annots) + " annotation(s) found") else: print("no annotations found") if __name__ == "__main__": main()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With