Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintained alternatives to PyPDF2

Tags:

python

pdf

pypdf2

I'm using the PyPDF2 library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again)

  • Is there a more vivid fork that is being maintained and developed?
  • Is there a good alternative?

From what I know, reportlab is more suitable for creating brand new pdf's (or maybe I'm just not experienced enough with reportlab).

like image 216
Peter Franek Avatar asked Jul 31 '20 22:07

Peter Franek


People also ask

Is PyPDF2 maintained?

Update: PyPDF2 is maintained again - and I am the maintainer :-) I've just released a new version with several bugfixes. Three potential alternatives which are maintained (just like PyPDF2): pymupdf : uses mupdf (only for open source due to mypdf license)

What is the use of PyPDF2?

PyPDF2: It is a python library used for performing major tasks on PDF files such as extracting the document-specific information, merging the PDF files, splitting the pages of a PDF file, adding watermarks to a file, encrypting and decrypting the PDF files, etc.


2 Answers

Update: PyPDF2 is maintained again - and I am the maintainer :-) I've just released a new version with several bugfixes.


Three potential alternatives which are maintained (just like PyPDF2):

  • pymupdf: uses mupdf (only for open source due to mypdf license)
  • pikepdf: Uses qpdf
  • pdfminer.six: A pure Python project.

I would not use:

  • PyPDF3 (pypi): Has less activity and probably less features than PyPDF2.
  • PyPDF4 (pypi): Last release on PyPI in 2018
like image 112
Martin Thoma Avatar answered Oct 21 '22 06:10

Martin Thoma


PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI.

Its performance stats are also very promising. Following are three sections that deal with different aspects of performance:

  • document parsing
  • text extraction
  • image rendering

PyMuPDF is the faster than pdfrw, PyPDF2, and pdftk.

like image 29
Vishal Singh Avatar answered Oct 21 '22 06:10

Vishal Singh