Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use AcroTextExtractor.exe programmatically?

Tags:

adobe-reader

I'm trying to do batch text extraction from PDF files. Tried many libraries and Adobe Reader seems the most accurate text extractor for me.

I noticed a file AcroTextExtractor.exe in the folder where Adobe Reader is installed. It sname seems promising and googling them shows this file is part of the PDF to text conversion routine.

How to call this file from command line to do text extraction?

like image 758
Marco Marsala Avatar asked Apr 09 '15 10:04

Marco Marsala


1 Answers

I've wanted to use that too for the same scenario.

I did an experiment to see if I could examine the command line that might be seen on a launch of AcroTextExtractor.exe.

I took a large PDF and opened it in Adobe Acrobat Reader DC version 2018.009.20050. I then saved it as text (File | Save as other | Text), and while Reader was generating the text file (successfully) I checked all running processes in Task Manager, sysinternals Process Explorer, and with WMI in Powershell.

Unfortunately I couldn't find a process launched with path including AcroTextExtractor.exe; thus I couldn't grab the command line.

It may well be a red herring.

like image 191
wisemoth Avatar answered Nov 02 '22 15:11

wisemoth