I am looking to Extract Text with its Font Details (Style,Size,color,Italic etc) from a PDF in Python.
I need to extract text and its metadata for translation purpose.Can anyone suggest any libraries for the same.
There is a python library for that. Please have a look at PDFMiner.
http://www.unixuser.org/~euske/python/pdfminer/index.html.
pdftext.py gives you the text extracted out of pdf and it also gives you other information like font and font size etc.
You can try that.
Note: Python 3 is not supported
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With