How to read pdf file using pdfminer3k?

1 Answers

I have corrected Lisa's code. It works now!

    fp = open(path, 'rb')
    from pdfminer.pdfparser import PDFParser, PDFDocument
    from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
    from pdfminer.converter import PDFPageAggregator
    from pdfminer.layout import LAParams, LTTextBox, LTTextLine

    parser = PDFParser(fp)
    doc = PDFDocument()
    parser.set_document(doc)
    doc.set_parser(parser)
    doc.initialize('')
    rsrcmgr = PDFResourceManager()
    laparams = LAParams()
    laparams.char_margin = 1.0
    laparams.word_margin = 1.0
    device = PDFPageAggregator(rsrcmgr, laparams=laparams)
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    extracted_text = ''

    for page in doc.get_pages():
        interpreter.process_page(page)
        layout = device.get_result()
        for lt_obj in layout:
            if isinstance(lt_obj, LTTextBox) or isinstance(lt_obj, LTTextLine):
                extracted_text += lt_obj.get_text()

182

answered Sep 24 '22 23:09

Matphy

Related questions
                            
                                Why does the python pathlib Path('').exists() return True?
                            
                                Python - matplotlib - differences between subplot() and subplots()
                            
                                How to convert memoryview to bytes?
                            
                                Merging data based on matching first column in Python
                            
                                ValueError: Could not find a format to read the specified file in mode 'i'
                            
                                "pylint (import error)" while import a module in the same folder with VSCode
                            
                                One liner to "assign if not None"
                            
                                Recommended way of closing files using pathlib module?
                            
                                Python 3.9.8 fails using Black and importing `typed_ast.ast3`
                            
                                Python nonlocal statement in a class definition
                            
                                python 3 types module
                            
                                How do I subclass the build command?
                            
                                How do I download a file using urllib.request in Python 3?
                            
                                What is reference stealing and borrowing ?
                            
                                Catch Broken Pipe in Python 2 AND Python 3
                            
                                How to add elements to a List in python?
                            
                                Efficiently check if two numbers are co-primes (relatively primes)?
                            
                                with os.scandir() raises AttributeError: __exit__
                            
                                How to remove EOFError: EOF when reading a line?
                            
                                How to round float 0.5 up to 1.0, while still rounding 0.45 to 0.0, as the usual school rounding?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read pdf file using pdfminer3k?

Tags:

python-3.x

pdf-scraping

python-3.5

poshita singh

People also ask

1 Answers

Matphy

Recent Activity

Donate For Us