I'm trying to do batch text extraction from PDF files. Tried many libraries and Adobe Reader seems the most accurate text extractor for me.
I noticed a file AcroTextExtractor.exe in the folder where Adobe Reader is installed. It sname seems promising and googling them shows this file is part of the PDF to text conversion routine.
How to call this file from command line to do text extraction?
I've wanted to use that too for the same scenario.
I did an experiment to see if I could examine the command line that might be seen on a launch of AcroTextExtractor.exe
.
I took a large PDF and opened it in Adobe Acrobat Reader DC version 2018.009.20050. I then saved it as text (File | Save as other | Text), and while Reader was generating the text file (successfully) I checked all running processes in Task Manager, sysinternals Process Explorer, and with WMI in Powershell.
Unfortunately I couldn't find a process launched with path including AcroTextExtractor.exe
; thus I couldn't grab the command line.
It may well be a red herring.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With