Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How load first 100 characters with png in TrainingImageLoader

Tags:

java

ocr

I would like to draw the best png file the first 100 characters, but if they can not do it all this out.

File is there: http://abatis.org.uk/projects/txt2fig.png

            File fff = new File("C:\\Users\\lll\\Desktop\\txt2fig.png");
            OCRScanner scanner = new OCRScanner();
            TrainingImageLoader loader = new TrainingImageLoader();
            HashMap<Character, ArrayList<TrainingImage>> trainingImageMap = new HashMap<Character, ArrayList<TrainingImage>>();
            loader.load(fff.getAbsolutePath(), new CharacterRange('A', 'Z'), trainingImageMap);
            scanner.addTrainingImages(trainingImageMap);

            Image image = ImageIO.read(fff);
            PixelImage pixelImage = new PixelImage(image);
            pixelImage.toGrayScale(true);
            pixelImage.filter();

            String text = scanner.scan(image, 0, 0, 0, 0, null);
            System.out.println(text);

Exception:

java.io.IOException: Expected to decode 26 characters but actually decoded 911 characters in training: C:\Users\lll\Desktop\txt2fig.png
    at net.sourceforge.javaocr.ocrPlugins.mseOCR.TrainingImageLoader.load(TrainingImageLoader.java:107)
    at net.sourceforge.javaocr.ocrPlugins.mseOCR.TrainingImageLoader.load(TrainingImageLoader.java:83)

My library in pom:

        <dependency>
            <groupId>net.sourceforge.javaocr</groupId>
            <artifactId>javaocr-core</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>net.sourceforge.javaocr.plugins</groupId>
            <artifactId>javaocr-plugin-awt</artifactId>
            <version>1.0</version>
        </dependency>

I know that the:

new CharacterRange ('A', 'Z')

should include the first and last character in the file, it can be somehow get around?

like image 562
LLL RRR Avatar asked Dec 14 '25 12:12

LLL RRR


1 Answers

You don't understand the concept of this tool. You've put image with text to OCR as a training image, while training image should have only training characters corresponding to the ASCII codes 0x20h to 0x7Ch (or above this range) in ther numerical order so at least like below:

 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Please note the space on the beginning of the training-image.

Firstly try to anazyle sample images and training-images from javaocr-20100605.zip/ocrTests/ directory, eg. the file trainingImages/hpljPica.jpg as a trainig-image and the file hpljPicaSample.jpg as an image to analyze. Use for it the tab with feature called Mean Square OCR Recognzier of the Java OCR GUI (executed by java -jar JavaOCR.jar). Later you can try with your own training-image composed from an image to analyze. For this purpose you can use feature from the tab called Character Extractor of the Java OCR GUI to extract characters from an image. So arrange output files with extracted characters ordered by their ASCII codes. Compose from them your training-image.

Screenshots attached below show how to use OCR with GUI and its results.

OCR with Java OCR tool from space to ~ OCR with Java OCR tool from space to ~

OCR results - you can see some OCR errors OCR results - you can see some OCR errors

As you can see at least two recognition errors occured, but it is not much.

like image 180
neuroanimal Avatar answered Dec 16 '25 12:12

neuroanimal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!