How can I read pdf in python? I know one way of converting it to text, but I want to read the content directly from pdf. Can anyone explain which module in python is best for pdf extraction

You can use textract module in python Textract for install <pre class="prettyprint"><code>pip install textract </code></pre> for read pdf <pre class="prettyprint"><code>import textract text = textract.process('path/to/pdf/file', method='pdfminer') </code></pre> For detail Textract

How can I read pdf in python? [duplicate]

2 Answers

You can USE PyPDF2 package

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open('example.pdf', 'rb')

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation [~~http://pythonhosted.org/PyPDF2/~~] https://pypdf2.readthedocs.io/en/latest/

148

answered Oct 12 '22 21:10

shankarj67

You can use textract module in python

Textract

for install

pip install textract

for read pdf

import textract
text = textract.process('path/to/pdf/file', method='pdfminer')

For detail Textract

answered Oct 12 '22 21:10

Kallz

Related questions
                            
                                How do I select and store columns greater than a number in pandas?
                            
                                Django error. Cannot assign must be an instance
                            
                                Recursively access dict via attributes as well as index access?
                            
                                boost::python: Python list to std::vector
                            
                                Python: How do I make temporary files in my test suite?
                            
                                Best way for a beginner to learn screen scraping by Python [closed]
                            
                                List directories with a specified depth in Python
                            
                                Python : 2d contour plot from 3 lists : x, y and rho?
                            
                                correct and efficient way to flatten array in numpy in python? [duplicate]
                            
                                Selenium - Click at certain position
                            
                                fminunc alternate in numpy
                            
                                How to provide Python syntax coloring inside Webstorm?
                            
                                replace the punctuation with whitespace
                            
                                Downloading multiple attachments using imaplib
                            
                                Parsing crontab-style lines
                            
                                Convert UUID 32-character hex string into a "YouTube-style" short id and back
                            
                                Read CSV file to numpy array, first row as strings, rest as float
                            
                                Highest Posterior Density Region and Central Credible Region
                            
                                Set no title for pandas boxplot (groupby)
                            
                                Read CSV items with column name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I read pdf in python? [duplicate]

Tags:

python

pdf

python-2.7

text-extraction

sg1994

People also ask

2 Answers

shankarj67

Kallz

Recent Activity

Donate For Us