Where is file needed for PDFTOTEXT output in UTF-8 format?

Question

I want to use the XPDF-based PDFTOTEXT command-line tool to look at PDF files, hoping to get UTF-8 output. I have seen others on StackOverflow getting it -- questions 4039930, 3809761 and 13618330 show that others have been able to use it.

When I use the option -enc utf-8 these messages are displayed:

Syntax Error: Couldn't find unicodeMap file for the 'utf-8' encoding
Config Error: Couldn't get text encoding

I've seen documentation that (among others) UTF-8 encoding is "predefined" but I cannot find the file that I need to point to. (I've looked at multiple different downloads of XPDF-based software and have not yet found it.)

Any pointers would be appreciated.

EDIT: I am on Windows.

Artem Klevtsov · Accepted Answer

You should use UTF-8 instead utf-8. See pdftotext help message:

$ pdftotext -listenc
Available encodings are:
UCS-2
ASCII7
Latin1
UTF-8
ZapfDingbats
Symbol

Proof code:

$ pdftotext -eol unix -nopgbrk -layout -enc utf-8 file.pdf
Syntax Error: Couldn't find unicodeMap file for the 'utf-8' encoding
Command Line Error: Couldn't get text encoding
$ pdftotext -eol unix -nopgbrk -layout -enc UTF-8 file.pdf
$ echo $?
0

Where is file needed for PDFTOTEXT output in UTF-8 format?

Tags:

utf-8

pdftotext

J.Merrill

1 Answers

Artem Klevtsov

Recent Activity

Donate For Us

Where is file needed for PDFTOTEXT output in UTF-8 format?

Tags:

utf-8

pdftotext

J.Merrill

1 Answers

Artem Klevtsov

Related questions

Recent Activity

Donate For Us