Parse annotations from a pdf

Tags:

I want a python function that takes a pdf and returns a list of the text of the note annotations in the document. I have looked at python-poppler (https://code.launchpad.net/~poppler-python/poppler-python/trunk) but I can not figure out how to get it to give me anything useful.

I found the get_annot_mapping method and modified the demo program provided to call it via self.current_page.get_annot_mapping(), but I have no idea what to do with an AnnotMapping object. It seems to not be fully implemented, providing only the copy method.

If there are any other libraries that provide this function, that's fine as well.

431

asked Jul 09 '09 19:07

davidb

1 Answers

Just in case somebody is looking for some working code. Here is a script I use.

import poppler import sys import urllib import os  def main():   input_filename = sys.argv[1]     # http://blog.hartwork.org/?p=612   document = poppler.document_new_from_file('file://%s' % \     urllib.pathname2url(os.path.abspath(input_filename)), None)   n_pages = document.get_n_pages()   all_annots = 0    for i in range(n_pages):         page = document.get_page(i)         annot_mappings = page.get_annot_mapping ()         num_annots = len(annot_mappings)         if num_annots > 0:             for annot_mapping in annot_mappings:                 if  annot_mapping.annot.get_annot_type().value_name != 'POPPLER_ANNOT_LINK':                     all_annots += 1                     print('page: {0:3}, {1:10}, type: {2:10}, content: {3}'.format(i+1, annot_mapping.annot.get_modified(), annot_mapping.annot.get_annot_type().value_nick, annot_mapping.annot.get_contents()))        if all_annots > 0:     print(str(all_annots) + " annotation(s) found")   else:     print("no annotations found")  if __name__ == "__main__":     main()

answered Oct 02 '22 12:10

Enno Gröper

Related questions
                            
                                Python and urllib2: how to make a GET request with parameters
                            
                                Writing comments to files with ConfigParser
                            
                                Django: how to annotate queryset with count of filtered ForeignKey field?
                            
                                use a css stylesheet on a jinja2 template
                            
                                what does yield without value do in context manager
                            
                                Exposing a C++ API to Python
                            
                                How to get the current running module path/name
                            
                                Python: How to force overwriting of files when using setup.py install (distutil)
                            
                                Bad operand type for unary +: 'str'
                            
                                Getting started with secure AWS CloudFront streaming with Python
                            
                                Configuring Python to use additional locations for site-packages
                            
                                Pythonic Style for Multiline List Comprehension [duplicate]
                            
                                How to remove outline of circle marker when using pyplot.plot in matplotlib
                            
                                Use of PunktSentenceTokenizer in NLTK
                            
                                Find and draw the largest contour in opencv on a specific color (Python)
                            
                                aws lambda: Error: Runtime exited with error: signal: killed
                            
                                How to create a draggable legend in matplotlib?
                            
                                How to get the common name for a pytz timezone eg. EST/EDT for America/New_York
                            
                                theano - print value of TensorVariable
                            
                                Nice IDE with GUI designer for wxPython or Tkinter [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parse annotations from a pdf

Tags:

python

pdf

davidb

People also ask

1 Answers

Enno Gröper

Recent Activity

Donate For Us