Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Warnings on pdfminer

I have found and (slightly) modified this script in stackoverflow for it to work on python 3.3:

from pdfminer.pdfinterp import PDFResourceManager, process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO

def convert_pdf(path):

    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, laparams=laparams)

    fp = open(path, 'rb')
    process_pdf(rsrcmgr, device, fp)
    fp.close()
    device.close()

    string = retstr.getvalue()
    retstr.close()
    return string


print(convert_pdf('abc.pdf'))

It works fine, however i seem to be having 2 issues:

  • While running the script I get tons of warnings:

    WARNING:root:undefined: PDFCIDFont: basefont='LKOELN+Wingdings-Regular', cidcoding='Adobe-Identity', 139
    WARNING:root:undefined: PDFCIDFont: basefont='LKKPCF+Wingdings2', cidcoding='Adobe-Identity', 132

Which in the printed text looks like (cid:139), how do I catch this warnings and replace that text with something else?

  • Note that I have a codec line, which in the original script goes inside the TextConverter(rsrcmgr, retstr, laparams=laparams), however I get:

    Traceback (most recent call last): File "C:/Users/rodrigo/Desktop/csp_pdf/csp_pdf2.py", line 46, in convert_pdf('abc.pdf') File "C:/Users/rodrigo/Desktop/csp_pdf/csp_pdf2.py", line 33, in convert_pdf device = TextConverter(rsrcmgr, retstr, codec = 'utf-8', laparams=laparams) TypeError: init() got an unexpected keyword argument 'codec'

Is this related to the first issue?

Thanks!

like image 515
rodrigocf Avatar asked Apr 21 '15 04:04

rodrigocf


1 Answers

Pdfminer3k logs to the Python root logger unfortunately. PDFMiner should implement logging correctly IMHO. So it is not possible to disable logging in the normal manner like.

logging.getLogger("pdfminer").setLevel(logging.WARNING)

Bummer!

I did this and it works™:

    logging.propagate = False 
    logging.getLogger().setLevel(logging.ERROR)

It sets the root logger to level Error. This will stop PDFMiner warn logging, since it logs to the root logger, but not your own logging.

I needed to set propagation to False, because after PDFMiner usage, I had duplicate logging entries. This was caused by the root logger.

like image 125
Pullie Avatar answered Sep 28 '22 18:09

Pullie