I'm having a lot of trouble getting a good grasp on decorators despite having read many an article on the subject (including [this][1] very popular one on SO). I'm suspecting I must be stupid, but with all the stubbornness that comes with being stupid, I've decided to try to figure this out.
That, and I suspect I have a good use case...
Below is some code from a project of mine that extracts text from PDF files. Processing involves three steps:
I recently learned about context managers and the with statement, and this seemed like a good use case for them. As such, I started by defining the PDFMinerWrapper class:
class PDFMinerWrapper(object):
'''
Usage:
with PDFWrapper('/path/to/file.pdf') as doc:
doc.dosomething()
'''
def __init__(self, pdf_doc, pdf_pwd=''):
self.pdf_doc = pdf_doc
self.pdf_pwd = pdf_pwd
def __enter__(self):
self.pdf = open(self.pdf_doc, 'rb')
parser = PDFParser(self.pdf) # create a parser object associated with the file object
doc = PDFDocument() # create a PDFDocument object that stores the document structure
parser.set_document(doc) # connect the parser and document objects
doc.set_parser(parser)
doc.initialize(self.pdf_pwd) # pass '' if no password required
return doc
def __exit__(self, type, value, traceback):
self.pdf.close()
# if we have an error, catch it, log it, and return the info
if isinstance(value, Exception):
self.logError()
print traceback
return value
Now I can easily work with a PDF file and be sure that it will handle errors gracefully. In theory, all I need to do is something like this:
with PDFMinerWrapper('/path/to/pdf') as doc:
foo(doc)
This is great, except that I need to check that the PDF document is extractable before applying a function to the object returned by PDFMinerWrapper. My current solution involves an intermediate step.
I'm working with a class I call Pamplemousse which serves as an interface to work with the PDFs. It, in turn, uses PDFMinerWrapper each time an operation must be performed on the file to which the object has been linked.
Here is some (abridged) code that demonstrates its use:
class Pamplemousse(object):
def __init__(self, inputfile, passwd='', enc='utf-8'):
self.pdf_doc = inputfile
self.passwd = passwd
self.enc = enc
def with_pdf(self, fn, *args):
result = None
with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
if doc.is_extractable: # This is the test I need to perform
# apply function and return result
result = fn(doc, *args)
return result
def _parse_toc(self, doc):
toc = []
try:
toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
except PDFNoOutlines:
pass
return toc
def get_toc(self):
return self.with_pdf(self._parse_toc)
Any time I wish to perform an operation on the PDF file, I pass the relevant function to the with_pdf method along with its arguments. The with_pdf method, in turn, uses the with statement to exploit the context manager of PDFMinerWrapper (thus ensuring graceful handling of exceptions) and executes the check before actually applying the function it has been passed.
My question is as follows:
I would like to simplify this code such that I do not have to explicitly call Pamplemousse.with_pdf. My understanding is that decorators could be of help here, so:
with statement and execute the extractability check?The way I interpreted you goal, was to be able to define multiple methods on your Pamplemousse class, and not constantly have to wrap them in that call. Here is a really simplified version of what it might be:
def if_extractable(fn):
# this expects to be wrapping a Pamplemousse object
def wrapped(self, *args):
print "wrapper(): Calling %s with" % fn, args
result = None
with PDFMinerWrapper(self.pdf_doc) as doc:
if doc.is_extractable:
result = fn(self, doc, *args)
return result
return wrapped
class Pamplemousse(object):
def __init__(self, inputfile):
self.pdf_doc = inputfile
# get_toc will only get called if the wrapper check
# passes the extractable test
@if_extractable
def get_toc(self, doc, *args):
print "get_toc():", self, doc, args
The decorator if_extractable is defined is just a function, but it expects to be used on instance methods of your class.
The decorated get_toc, which used to delegate to a private method, simply will expect to receive a doc object and the args, if it passed the check. Otherwise it doesn't get called and the wrapper returns None.
With this, you can keep defining your operation functions to expect a doc
You could even add some type checking to make sure its wrapping the expected class:
def if_extractable(fn):
def wrapped(self, *args):
if not hasattr(self, 'pdf_doc'):
raise TypeError('if_extractable() is wrapping '\
'a non-Pamplemousse object')
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With