Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract comments from pdf

Tags:

r

pdf

I have a collection of .pdf files with comments that were added in Adobe Acrobat. I would like to be able to analyze these comments, but I'm kind of stuck on extracting them. I've looked at the pdftools package, but it seems to only be able to extract the text and not the comments. Is there a method available for extracting the comments within R?

like image 877
Robert Bradford Avatar asked Jun 11 '18 15:06

Robert Bradford


1 Answers

PyMuPDF (https://pymupdf.readthedocs.io/en/latest/) is the only python library I have found working.

Installation in Debian/Ubuntu-based distributions:

apt-get install python3-fitz

Script:

import fitz
doc = fitz.open("example.pdf")
for i in range(doc.pageCount):
  page = doc[i]
  for annot in page.annots():
    print(annot.info["content"])
like image 178
Bernuly Avatar answered Jan 09 '23 01:01

Bernuly