I have a collection of .pdf files with comments that were added in Adobe Acrobat. I would like to be able to analyze these comments, but I'm kind of stuck on extracting them. I've looked at the pdftools package, but it seems to only be able to extract the text and not the comments. Is there a method available for extracting the comments within R?
PyMuPDF (https://pymupdf.readthedocs.io/en/latest/) is the only python library I have found working.
Installation in Debian/Ubuntu-based distributions:
apt-get install python3-fitz
Script:
import fitz
doc = fitz.open("example.pdf")
for i in range(doc.pageCount):
page = doc[i]
for annot in page.annots():
print(annot.info["content"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With