I'm using the <code>PyPDF2</code> library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again) <ul> <li>Is there a more vivid fork that is being maintained and developed?</li> <li>Is there a good alternative?</li> </ul> From what I know, <code>reportlab</code> is more suitable for creating brand new pdf's (or maybe I'm just not experienced enough with reportlab).

<blockquote> PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI. </blockquote> Its performance stats are also very promising. Following are three sections that deal with different aspects of performance: <ul> <li>document parsing</li> <li>text extraction</li> <li>image rendering</li> </ul> PyMuPDF is the faster than pdfrw, PyPDF2, and pdftk.

Maintained alternatives to PyPDF2

Tags:

python

pdf

pypdf2

I'm using the PyPDF2 library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again)

Is there a more vivid fork that is being maintained and developed?
Is there a good alternative?

From what I know, reportlab is more suitable for creating brand new pdf's (or maybe I'm just not experienced enough with reportlab).

216

asked Jul 31 '20 22:07

Peter Franek

2 Answers

Update: PyPDF2 is maintained again - and I am the maintainer :-) I've just released a new version with several bugfixes.

Three potential alternatives which are maintained (just like PyPDF2):

pymupdf: uses mupdf (only for open source due to mypdf license)
pikepdf: Uses qpdf
pdfminer.six: A pure Python project.

I would not use:

PyPDF3 (pypi): Has less activity and probably less features than PyPDF2.
PyPDF4 (pypi): Last release on PyPI in 2018

112

answered Oct 21 '22 06:10

Martin Thoma

PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI.

Its performance stats are also very promising. Following are three sections that deal with different aspects of performance:

document parsing
text extraction
image rendering

PyMuPDF is the faster than pdfrw, PyPDF2, and pdftk.

answered Oct 21 '22 06:10

Vishal Singh

Related questions
                            
                                Doing the opposite of pivot in pandas Python
                            
                                Restricting all the views to authenticated users in Django
                            
                                How to filter JSON Array in Django JSONField
                            
                                access remote files on server with smb protocol python3
                            
                                Running Julia .jl file in python
                            
                                Pandas: convert date 'object' to int
                            
                                Pandas - Add Column Name to Results of groupby [duplicate]
                            
                                Dynamic table with Python
                            
                                Transposing selected MultiIndex levels in Pandas DataFrame
                            
                                Conda command working in command prompt but not in bash script
                            
                                Python 3.6 DateTime Strptime Returns error while Python 3.7 works well
                            
                                Anaconda prompt closes immediately - the system was unable to find the specified registry key or value
                            
                                How to upload multiple files with flask-wtf?
                            
                                Theoretical vs actual time-complexity for algorithm calculating 2^n
                            
                                How to access the network weights while using PyTorch 'nn.Sequential'?
                            
                                how to set logging level from command line
                            
                                How to create a dictionary using a single list?
                            
                                What's the most space-efficient way to compress serialized Python data?
                            
                                Tensorflow 2: how to switch execution from GPU to CPU and back?
                            
                                RuntimeError: __class__ not set defining 'AbstractBaseUser' as <class 'django.contrib.auth.base_user.Abstract BaseUser'>. Was __classcell__ propagated

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With