Extracting entire pdf data with python pdfminer

Question

I am using pdfminer to extract data from pdf files using python. I would like to extract all the data present in pdf irrespective of wheather it is an image or text or whatever it is. Can we do that in a single line(or two if needed, without much work). Any help is appreciated. Thanks in advance

alexis · Accepted Answer

Can we do that in a single line(or two if needed, without much work).

No, you cannot. Pdfminer is powerful but it's rather low-level.

Unfortunately, the documentation is not exactly exhaustive. I was able to find my way around it thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

See also this answer, where I give a little more detail.

Extracting entire pdf data with python pdfminer

Tags:

python

pdf-reader

sunil reddy

1 Answers

alexis

Recent Activity

Donate For Us

Extracting entire pdf data with python pdfminer

Tags:

python

pdf-reader

sunil reddy

1 Answers

alexis

Related questions

Recent Activity

Donate For Us