Select only first page of PDF pypdf2

Tags:

I am trying to strip out only the first page of multiple PDF files and combine into one file. (I receive 150 PDF files a day, the first page is the invoice which I need, the following three to 12 pages are just backup which I do not need) So the input is 150 PDF files of varying size and the output I want is 1 PDF file containing only the first page of each of the 150 files.

What I seem to have done is to have merged all the pages EXCEPT the first page (which is the only one I need).

# Get all PDF documents in current directory
import os

pdf_files = []
for filename in os.listdir("."):
    if filename.endswith(".pdf"):
        pdf_files.append(filename)
pdf_files.sort(key=str.lower)

# Take first page from each PDF
from PyPDF2 import PdfFileWriter, PdfFileReader

for filename in pdf_files:
    reader = PdfFileReader(filename)

writer = PdfFileWriter()
for pageNum in range(1, reader.numPages):
    page = reader.getPage(pageNum)
    writer.addPage(page)

with open("CombinedFirstPages.pdf", "wb") as fp:
    writer.write(fp)

851

asked Nov 05 '17 19:11

mike horan

1 Answers

Try this:

# Get all PDF documents in current directory
import os

your_target_folder = "."
pdf_files = []
for dirpath, _, filenames in os.walk(your_target_folder):
    for items in filenames:
        file_full_path = os.path.abspath(os.path.join(dirpath, items))
        if file_full_path.lower().endswith(".pdf"):
            pdf_files.append(file_full_path)
pdf_files.sort(key=str.lower)

# Take first page from each PDF
from PyPDF2 import PdfFileReader, PdfFileWriter

writer = PdfFileWriter()

for file_path in pdf_files:
    reader = PdfFileReader(file_path)
    page = reader.getPage(0)
    writer.addPage(page)

with open("CombinedFirstPages.pdf", "wb") as output:
    writer.write(output)

157

answered Oct 13 '22 01:10

DRPK

Related questions
                            
                                python combine rows in dataframe and add up values
                            
                                Django TestCase: recreate database in self.subTest(...)
                            
                                How do you give a wagtail/django Page a custom url to serve at?
                            
                                Benefit of using custom initialize function instead of `__init__` in python
                            
                                Python: name of parent package not recognized in import statements
                            
                                Kivy error: raise FactoryException('Unknown class <%s>' % name)
                            
                                Apodization Mask for Fast Fourier Transforms in Python
                            
                                Pyinstaller with Tensorflow takes incorrect path for _checkpoint_ops.so file
                            
                                How to include a git repo as a dependency when using pbr
                            
                                adding flower to celery daemon?
                            
                                Time.sleep seems to be blocking main thread, not just child thread?
                            
                                Django | update requirements.txt automatically after installing new package
                            
                                Big HDF5 dataset, how to efficienly shuffle after each epoch
                            
                                Python iter() time complexity?
                            
                                Keras fit_generator with pandas iterator object
                            
                                Modified kivy scatter widget does not update transformation
                            
                                Error in R: The h5py Python package is required to save and load models
                            
                                Python list.clear complexity [duplicate]
                            
                                Tensorflow - How to freeze a .pb from the SavedModel to be used for inference in TensorFlowInferenceInterface?
                            
                                How to count members of a set in a string in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select only first page of PDF pypdf2

Tags:

python

merge

split

pdf

pypdf2

mike horan

People also ask

1 Answers

DRPK

Recent Activity

Donate For Us