I'm working on custom search engine for my PDF data corpus. I have a transformation layer which is able to dump PDF content to text (using Apache Tika and GROBID). I have finished search layers and the view which return search results listing. Now, I'd like to add highlighting feature on original PDF for the lines, where search terms was appeared. Yes, I wanna modifiy PDF files if it is necessary. Is there any way for highlight text inside in PDF file? Are PDFMiner or PyPDF2 or other Python library is able to do that? ... or can you recommand other, maybe external service for it?

You can highlight text using PyPDF2. In order to find the text's location, check out this answer.

Highlight text in a PDF with Python [closed]

1 Answers

You can highlight text using PyPDF2.

In order to find the text's location, check out this answer.

162

answered Sep 17 '22 14:09

spacevillain

Related questions
                            
                                How to install python-distutils for old python versions
                            
                                How to efficiently run multiple Pytorch Processes / Models at once ? Traceback: The paging file is too small for this operation to complete
                            
                                Why is there no speed-up when using pythons multiprocessing for embarassingly parallel problem within a for-loop, with shared numpy data?
                            
                                python setup.py develop to override installed version
                            
                                Parsing mbox files in Python
                            
                                python setup.py configuration to install files in custom directories
                            
                                pymongo connection pooling and client requests
                            
                                print a binary tree on its side
                            
                                Python: Ignore xmlns in elementtree.ElementTree
                            
                                Numpy: Difference between dot(a,b) and (a*b).sum()
                            
                                getting URLError: <urlopen error [Errno 111] Connection refused> in selenium webdriver using python in phantomjs
                            
                                python: merging dictionaries by identical value of key [duplicate]
                            
                                How do I package for distribution a python module that uses a shared library?
                            
                                A simple example of using cmake to build a Windows DLL
                            
                                Run Python script from AJAX or JQuery
                            
                                Auto-import doesn't follow PEP8
                            
                                High Kernel CPU when running multiple python programs
                            
                                Best practice when using folium on django
                            
                                Getting signals working on PulseAudio's DBus interface?
                            
                                How do I configure spacemacs for python 3?

Highlight text in a PDF with Python [closed]

Tags:

python

search

pdf

pypdf

pdfminer

Katharsis

People also ask

1 Answers

spacevillain

Recent Activity

Donate For Us