I'm using the PyPDF2
library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again)
From what I know, reportlab
is more suitable for creating brand new pdf's (or maybe I'm just not experienced enough with reportlab).
Update: PyPDF2 is maintained again - and I am the maintainer :-) I've just released a new version with several bugfixes. Three potential alternatives which are maintained (just like PyPDF2): pymupdf : uses mupdf (only for open source due to mypdf license)
PyPDF2: It is a python library used for performing major tasks on PDF files such as extracting the document-specific information, merging the PDF files, splitting the pages of a PDF file, adding watermarks to a file, encrypting and decrypting the PDF files, etc.
Update: PyPDF2 is maintained again - and I am the maintainer :-) I've just released a new version with several bugfixes.
Three potential alternatives which are maintained (just like PyPDF2):
pymupdf
: uses mupdf (only for open source due to mypdf license)pikepdf
: Uses qpdf
pdfminer.six
: A pure Python project.I would not use:
PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI.
Its performance stats are also very promising. Following are three sections that deal with different aspects of performance:
PyMuPDF is the faster than pdfrw, PyPDF2, and pdftk.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With