Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve digit recognition of a model trained on MNIST?

I am working on handprinted multi-digit recognition with Java, using OpenCV library for preprocessing and segmentation, and a Keras model trained on MNIST (with an accuracy of 0.98) for recognition.

The recognition seems to work quite well, apart from one thing. The network quite often fails to recognize the ones (number "one"). I can't figure out if it happens due to preprocessing / incorrect implementation of the segmentation, or if a network trained on standard MNIST just hasn't seen the number one which looks like my test cases.

Here's what the problematic digits look like after preprocessing and segmentation:

enter image description here becomes enter image description here and is classified as 4.

enter image description here becomes enter image description here and is classified as 7.

enter image description here becomes enter image description here and is classified as 4. And so on...

Is this something that could be fixed by improving the segmentation process? Or rather by enhancing the training set?

Edit: Enhancing the training set (data augmentation) would definitely help, which I am already testing, the question of correct preprocessing still remains.

My preprocessing consists of resizing, converting to grayscale, binarization, inversion, and dilation. Here's the code:

Mat resized = new Mat();
Imgproc.resize(image, resized, new Size(), 8, 8, Imgproc.INTER_CUBIC);

Mat grayscale = new Mat();
Imgproc.cvtColor(resized, grayscale, Imgproc.COLOR_BGR2GRAY);

Mat binImg = new Mat(grayscale.size(), CvType.CV_8U);
Imgproc.threshold(grayscale, binImg, 0, 255, Imgproc.THRESH_OTSU);

Mat inverted = new Mat();
Core.bitwise_not(binImg, inverted);

Mat dilated = new Mat(inverted.size(), CvType.CV_8U);
int dilation_size = 5;
Mat kernel = Imgproc.getStructuringElement(Imgproc.CV_SHAPE_CROSS, new Size(dilation_size, dilation_size));
Imgproc.dilate(inverted, dilated, kernel, new Point(-1,-1), 1);

The preprocessed image is then segmented into individual digits as following:

List<Mat> digits = new ArrayList<>();
List<MatOfPoint> contours = new ArrayList<>();
Imgproc.findContours(preprocessed.clone(), contours, new Mat(), Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE);

// code to sort contours
// code to check that contour is a valid char

List rects = new ArrayList<>();

for (MatOfPoint contour : contours) {
     Rect boundingBox = Imgproc.boundingRect(contour);
     Rect rectCrop = new Rect(boundingBox.x, boundingBox.y, boundingBox.width, boundingBox.height);

     rects.add(rectCrop);
}

for (int i = 0; i < rects.size(); i++) {
    Rect x = (Rect) rects.get(i);
    Mat digit = new Mat(preprocessed, x);

    int border = 50;
    Mat result = digit.clone();
    Core.copyMakeBorder(result, result, border, border, border, border, Core.BORDER_CONSTANT, new Scalar(0, 0, 0));

    Imgproc.resize(result, result, new Size(28, 28));
    digits.add(result);
}
like image 695
youngpanda Avatar asked Oct 15 '19 16:10

youngpanda


3 Answers

After some research and experiments, I came to a conclusion that the image preprocessing itself was not the problem (I did change some suggested parameters, like e.g. dilation size and shape but they were not crucial to the results). What did help, however, were 2 following things:

  1. As @f4f noticed, I needed to collect my own dataset with real-world data. This already helped tremendously.

  2. I made important changes to my segmentation preprocessing. After getting individual contours, I first size-normalize the images to fit into a 20x20 pixel box (as they are in MNIST). After that I center the box in the middle of 28x28 image using the center of mass (which for binary images is the mean value across both dimensions).

Of course, there are still difficult segmentation cases, such as overlapping or connected digits, but the above changes answered my initial question and improved my classification performance.

like image 53
youngpanda Avatar answered Sep 21 '22 22:09

youngpanda


I believe that your problem is dilation process. I understand that you wish to normalize image sizes, but you shouldn't break the proportions, you should resize to maximum desired by one axis (the one that allows largest re-scale without letting another axis dimension to exceed the maximum size) and fill with background color the rest of the image. It's not that "standard MNIST just hasn't seen the number one which looks like your test cases", you make your images look like different trained numbers (the ones that are recognized)

Overlap of the source and processed images

If you maintained the correct aspect ration of your images (source and post-processed), you can see that you did not just resize the image but "distorted" it. It can be the result of either non-homogeneous dilation or incorrect resizing

like image 5
SiR Avatar answered Oct 23 '22 12:10

SiR


There are already some answers posted but neither of them answers your actual question about image preprocessing.

In my turn I don't see any significant problems with your implementation as long as it's a study project, well done.

But one thing to notice you may miss. There are basic operations in mathematical morphology: erosion and dilation (used by you). And there complex operations: various combinations of basic ones (eg. opening and closing). Wikipedia link is not the best CV reference, but you may start with it to get the idea.

Usually in its better to use opening instead of erosion and closing instead of dilation since in this case original binary image changes much less (but the desired effect of cleaning sharp edges or filling gaps is reached). So in your case you should check closing (image dilation followed by erosion with the same kernel). In case extra-small image 8*8 is greatly modified when you dilate even with 1*1 kernel (1 pixel is more than 16% of image) which is less on larger images).

To visualize the idea see the following pics (from OpenCV tutorials: 1, 2):

dilation: original symbol and dilated one

closing: original symbol and closed one

Hope it helps.

like image 5
f4f Avatar answered Oct 23 '22 12:10

f4f