Edit existing PDF's pages in Python

Tags:

I have a PDF file which I removed some pages from it. I want to correct(fix) the new pdf page numbers. Is there any way/library to update the page numbers without converting the pdf to another format? I have tried to convert the pdf to text, XML, and JSON and then fix the page number. However, if I convert it back to pdf, it looks messy(cannot keep the style of the original pdf). The problems I have are:

Removing the old page numbers.
Adding new page numbers.

I am using python on Ubuntu. I have tried ReportLab, PyX, and pyfpdf.

238

asked Jun 25 '19 18:06

Sina

1 Answers

I have had a similar problem, I honestly could not fully solve it, rather, I fetched the corresponding html and processed it with BeautifulSoup. However, I did get a closer approach than python modules, I used pdftotext.exe from poppler (link at the bottom) to read the pdf file, and it worked just fine, besides the fact that it was not able to distinguish between text columns. As this is not a python module, I used os.system to call the command string on the .exe file.

def call_poppler(input_pdf, input_path):

    """
    Call poppler to generate a txt file
    """
    command_row = input_path + " " + input_pdf
    os.system(command_row)
    txt_name = input_pdf[0:-4] + ".txt"
    processed_paper = open_txt(txt_name)
    return processed_paper

def open_txt(input_txt_name):

    """
    Open and generate a python object out of the
    txt attained with poppler
    """
    opened_file = open(input_txt_name,"rb").readlines()
    output_file = []
    for row in opened_file:
        row = row.decode("utf-8").strip()
        output_file.append(row)
    return output_file

This returns you a processed ".txt" file that you can then process as you want and rewrite as a pdf with some module, such as pypdf, sorry if it was not the answer you wanted, but pdf files are rather hard to handle in python since they are not text based files. Do not forget to give the path of the executable. You can get poppler here: https://poppler.freedesktop.org/

150

answered Oct 26 '22 10:10

Preto

Related questions
                            
                                Runtime error with python code online, works offline
                            
                                Decorating a property: right order
                            
                                Multi-feature causal CNN - Keras implementation
                            
                                How to send custom headers in a Scrapy Splash request?
                            
                                Converting igraph to networkx for clustering
                            
                                Conda install takes forever (stuck as SAT solver)
                            
                                django-taggit not working when using UUID
                            
                                How to have a mix of both Celery Executor and Kubernetes Executor in Apache Airflow?
                            
                                Access Google Trends Data without a wrapper, or with the API: Python
                            
                                Why does python round(np.float16(np.pi),5) return infinity? Bug, limitation, or expected?
                            
                                How can gitlab-CI install private python packages from a gitlab dependency that also refers to gitlab repositories
                            
                                Effective-Date-Range One-Hot-Encode groupby
                            
                                Error state Kalman Filter from MATLAB to Python
                            
                                Not found: Container localhost does not exist when I load model with tensorflow and flask
                            
                                Why my one-filter convolutional neural network is unable to learn a simple gaussian kernel?
                            
                                Install from pipfile using pipenv install gives error
                            
                                How Batch learning in Pytorch is performed?
                            
                                How to enable logging of Flask app with `gevent.pywsgi.WSGIServer` and `WebSocketHandler`?
                            
                                Read YAML file as list
                            
                                How to vectorize a loop through a matrix numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Edit existing PDF's pages in Python

Tags:

python

pdf

Sina

People also ask

1 Answers

Preto

Recent Activity

Donate For Us