Say you have a dataset that has images and some data in a <code>.csv</code> for each image. Your goal is to create a NN that has a convolution branch and another one (in my case an MLP). Now, there are plenty of guides (one here, another one) on how to create the network, that's not the problem. The issue here is how do I create an iterator in the form of <code>[[convolution_input, other_features], target]</code> when the <code>convolution_input</code> is from a Keras <code>ImageDataGenerator</code> flow that adds augmented images. More specifically, when the nth image (that may be an augmented one or not) is fed to the NN, I want its original features inside <code>other_features</code>. I found a few attempts (here and here, the second one looked promising but I wasn't able to figure out how to handle augmented images) in doing exactly that but they do not seem to take into account the possible dataset manipulation that the Keras generator does.

Let's say, you have a CSV, such that your images and the other features are in the file. Where <code>id</code> represents the image name, and followed by the features, and followed by your target, (class for classification, number for regression) <pre class="prettyprint"><code>| id | feat1 | feat2 | feat3 | class | |---------------------|-------|-------|-------|-------| | 1_face_IMG_NAME.jpg | 1 | 0 | 1 | A | | 3_face_IMG_NAME.jpg | 1 | 0 | 1 | B | | 2_face_IMG_NAME.jpg | 1 | 0 | 1 | A | | ... | ... | ... | ... | ... | </code></pre> First, let us define a data generator, and later we can override it. Let us read the data from the CSV in a pandas data frame and use keras's <code>flow_from_dataframe</code> to read from the data frame. <pre class="prettyprint"><code>df = pandas.read_csv("dummycsv.csv") datagen = ImageDataGenerator(rescale=1/255.) generator = datagen.flow_from_dataframe( df, directory="out/", x_col="id", y_col=df.columns[1:], class_mode="raw", batch_size=1) </code></pre> You can always add your augmentation in <code>ImageDataGenerator</code>. Things to note in the above code in <code>flow_from_dataframe</code> is <code>x_col</code> = the image name <code>y_col</code> = typically columns with the class name, but let us override it later by first providing all the other columns in the CSV. i.e. <code>feat_1</code>, <code>feat_2</code>.... till class_label <code>class_mode</code> = <code>raw</code>, suggests the generator to return all the values in <code>y</code> as is. Now let us override/inherit the above generator and create a new one, such that it returns [img, otherfeatures], [target] Here is the code with comments as explanations: <pre class="prettyprint"><code>def my_custom_generator(): # to keep track of complete epoch count = 0 while True: if count == len(df.index): # if the count is matching with the length of df, # the one pass is completed, so reset the generator generator.reset() break count += 1 # get the data from the generator data = generator.next() # the data looks like this [[img,img] , [other_cols,other_cols]] based on the batch size imgs = [] cols = [] targets = [] # iterate the data and append the necessary columns in the corresponding arrays for k in range(batch_size): # the first array contains all images imgs.append(data[0][k]) # the second array contains all features with last column as class, so [:-1] cols.append(data[1][k][:-1]) # the last column in the second array from data is the class targets.append(data[1][k][-1]) # this will yield the result as you expect. yield [imgs,cols], targets </code></pre> Create a similar function for your validation generator. Use <code>train_test_split</code> to split your data frame if you need it and create 2 generators and override them. Pass the function in <code>model.fit_generator</code> like this <pre class="prettyprint"><code>model.fit_generator(my_custom_generator(),.....other params) </code></pre>

Keras iterator with augmented images and other features

Tags:

python

keras

conv-neural-network

data-augmentation

Say you have a dataset that has images and some data in a .csv for each image. Your goal is to create a NN that has a convolution branch and another one (in my case an MLP).

Now, there are plenty of guides (one here, another one) on how to create the network, that's not the problem.

The issue here is how do I create an iterator in the form of [[convolution_input, other_features], target] when the convolution_input is from a Keras ImageDataGenerator flow that adds augmented images.

More specifically, when the nth image (that may be an augmented one or not) is fed to the NN, I want its original features inside other_features.

I found a few attempts (here and here, the second one looked promising but I wasn't able to figure out how to handle augmented images) in doing exactly that but they do not seem to take into account the possible dataset manipulation that the Keras generator does.

473

asked Mar 03 '20 18:03

Lamberto Basti

1 Answers

Let's say, you have a CSV, such that your images and the other features are in the file.

Where id represents the image name, and followed by the features, and followed by your target, (class for classification, number for regression)

|         id          | feat1 | feat2 | feat3 | class |
|---------------------|-------|-------|-------|-------|
| 1_face_IMG_NAME.jpg |   1   |   0   |   1   |   A   |
| 3_face_IMG_NAME.jpg |   1   |   0   |   1   |   B   |
| 2_face_IMG_NAME.jpg |   1   |   0   |   1   |   A   |
|         ...         |  ...  |  ...  |  ...  |  ...  |

First, let us define a data generator, and later we can override it.

Let us read the data from the CSV in a pandas data frame and use keras's flow_from_dataframe to read from the data frame.

df = pandas.read_csv("dummycsv.csv")
datagen = ImageDataGenerator(rescale=1/255.)
generator = datagen.flow_from_dataframe(
                df,
                directory="out/",
                x_col="id",
                y_col=df.columns[1:],
                class_mode="raw",
                batch_size=1)

You can always add your augmentation in ImageDataGenerator.

Things to note in the above code in flow_from_dataframe is

x_col = the image name

y_col = typically columns with the class name, but let us override it later by first providing all the other columns in the CSV. i.e. feat_1, feat_2.... till class_label

class_mode = raw, suggests the generator to return all the values in y as is.

Now let us override/inherit the above generator and create a new one, such that it returns [img, otherfeatures], [target]

Here is the code with comments as explanations:

def my_custom_generator():
    # to keep track of complete epoch
    count = 0 
    while True:
        if count == len(df.index):
            # if the count is matching with the length of df, 
            # the one pass is completed, so reset the generator
            generator.reset()
            break
        count += 1
        # get the data from the generator
        data = generator.next()

        # the data looks like this [[img,img] , [other_cols,other_cols]]  based on the batch size        
        imgs = []
        cols = []
        targets = []

        # iterate the data and append the necessary columns in the corresponding arrays 
        for k in range(batch_size):
            # the first array contains all images
            imgs.append(data[0][k])
      
            # the second array contains all features with last column as class, so [:-1]
            cols.append(data[1][k][:-1])

            # the last column in the second array from data is the class
            targets.append(data[1][k][-1])

        # this will yield the result as you expect.
        yield [imgs,cols], targets

Create a similar function for your validation generator. Use train_test_split to split your data frame if you need it and create 2 generators and override them.

Pass the function in model.fit_generator like this

model.fit_generator(my_custom_generator(),.....other params)

178

answered Nov 15 '22 18:11

venkata krishnan

Related questions
                            
                                How can I wait until I receive data using a Python socket?
                            
                                vscode python remote interpreter
                            
                                Pip install - do downloaded whl files persist & take disk space?
                            
                                Given a list of words and a sentence find all words that appear in the sentence either in whole or as a substring
                            
                                Training hyperparameters for multidimensional Gaussian process regression
                            
                                How to change the length of a Primary Key field in Alembic?
                            
                                MAP@k computation
                            
                                ImportError - attempted relative import with no known parent package
                            
                                Which backend for matplotlib using MacOS?
                            
                                Selenium Chromedriver not navigating to url
                            
                                Pandas Storing df to csv in BytesIO
                            
                                Join the two images
                            
                                Heroku python app failing to build when installing sqlite3
                            
                                How to mount google drive to R notebook in colab?
                            
                                Why does * work differently in assignment statements versus function calls?
                            
                                Run inference on CPU using pytorch and multiprocessing
                            
                                Unable to let my script generate few values automatically to be used within payload
                            
                                Why am I getting the "MySQL server has gone away" exception in Django?
                            
                                How to define Python Enum properties if MySQL ENUM values have space in their names?
                            
                                How to suppress specific warning in Tensorflow (Python)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With