Tesseract does not recognize german "für"

Question

I use the tesseract 4.0 via docker image tesseractshadow/tesseract4re

I use the option -l=deu to give tesseract the hint, that the text is in "deutsch" (german).

Still the result for the german word "für" is not good. The german word is very common (meaning "for" in english).

Tesseract often detects "fiir" or "fur".

What can I do to improve this?

reproducible example

docker run --name self.container_name --rm \
    --volume  $PWD:/pwd \
    tesseractshadow/tesseract4re \
    tesseract /pwd/die-fuer-das.png /pwd/die-fuer-das.png.ocr-result -l=deu

Result:

cat die-fuer-das.png.ocr-result.txt 
die fur das

Image die_fuer_das.png:

enter image description here

guettli · Accepted Answer

I found the solution. It needs to be -l deu otherwise the german language does not get used. I accidentally used -l=deu.

Works:

===> tesseract  die-fuer-das.png out  -l deu; cat out.txt
Tesseract Open Source OCR Engine v4.0.0-beta.1-262-g555f with Leptonica
die für das

Wrong language:

===> tesseract  die-fuer-das.png out  -l=deu; cat out.txt
Tesseract Open Source OCR Engine v4.0.0-beta.1-262-g555f with Leptonica
die fur das

Tesseract does not recognize german "für"

Tags:

ocr

tesseract

guettli

1 Answers

guettli

Recent Activity

Donate For Us

Tesseract does not recognize german "für"

Tags:

ocr

tesseract

guettli

1 Answers

guettli

Related questions

Recent Activity

Donate For Us