Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating custom datasets

I imagine this is a broadly applicable question, but I'm trying to create a dataset for a particular competition that involves flying a UAV over a field with cardboard geometric shapes with alphanumeric characters painted on. The objective is to detect and classify the shapes and characters.

Currently, I'm using SURF to detect the shape, K-means to segment the shape and character, and a convolutional neural network to classify each. However, I'm experiencing a bottleneck when it comes to training data that can perform well with real data.

What I've Tried

  • Generating a dataset with Keras' ImageDataGenerator with random rotations, scalings, and skewings of a template image of each of the alphanumeric characters of a typewritten font and geometric shapes: works fine with data from the dataset (go figure) and some outside data but gets confused when the characters are too deviant

  • Using the MNIST dataset: no complaints, but only contains numbers

  • Using the EMNIST ByClass dataset (which is different from the MNIST dataset; contains letters as well): doesn't train easily because of size, and doesn't perform well even when trained to a decently high accuracy. In the dataset itself, many images bear little resemblance to the purported class, and some classes are at different rotations than others

  • Using Tesseract OCR for the characters. This hasn't had great results

What I Haven't Tried

  • Doing several flyovers with real cardboard cutouts that we create and using several frames from each video for the dataset. Cons: this would require quite a lot of flights and cardboard cutouts and wouldn't offer much data variation.

  • Using the ImageDataGenerator, but on several different fonts instead of one.

Does anyone have any advice on how to create a custom dataset for a task like this?

like image 465
Josh Payne Avatar asked Feb 11 '26 20:02

Josh Payne


1 Answers

this is my dataSetGenerator maybe help you to generate your own dataset

import numpy as np
from os import listdir
from glob import glob
import cv2

def dataSetGenerator(path,resize=False,resize_to=224,percentage=100):
    """

    DataSetsFolder
      |
      |----------class-1
      |        .   |-------image-1
      |        .   |         .
      |        .   |         .
      |        .   |         .
      |        .   |-------image-n
      |        .
      |-------class-n

    :param path: <path>/DataSetsFolder
    :param resize:
    :param resize_to:
    :param percentage:
    :return: images, labels, classes
    """
    classes = listdir(path)
    image_list = []
    labels = []
    for classe in classes:
        for filename in glob(path+'/'+classe+'/*.tif'):
            if resize:image_list.append(cv2.resize(cv2.imread(filename),(resize_to, resize_to)))
            else:image_list.append(cv2.imread(filename))
            label=np.zeros(len(classes))
            label[classes.index(classe)]=1
            labels.append(label)
    indice = np.random.permutation(len(image_list))[:int(len(image_list)*percentage/100)]
    return np.array([image_list[x] for x in indice]),np.array([labels[x] for x in indice]),np.array(classes)
like image 186
Sakhri Houssem Avatar answered Feb 14 '26 09:02

Sakhri Houssem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!