Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement multi-class semantic segmentation?

I'm able to train a U-net with labeled images that have a binary classification.

But I'm having a hard time figuring out how to configure the final layers in Keras/Theano for multi-class classification (4 classes).

I have 634 images and corresponding 634 masks that are unit8 and 64 x 64 pixels.

My masks, instead of being black (0) and white (1), have color labeled objects in 3 categories plus background as follows:

  • black (0), background
  • red (1), object class 1
  • green (2), object class 2
  • yellow (3), object class 3

Before training runs, the array containing masks is one-hot encoded as follows:

mask_train = to_categorical(mask_train, 4)

This makes mask_train.shape go from (634, 1, 64, 64) to (2596864, 4).

My model closely follows the Unet architecture, however the final layers seem problematic, as I'm unable to flatten the structure so as to match the one-hot encoded array.

[...]
up3 = concatenate([UpSampling2D(size=(2, 2))(conv7), conv2], axis=1)
conv8 = Conv2D(128, (3, 3), activation='relu', padding='same')(up3)
conv8 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv8)

up4 = concatenate([UpSampling2D(size=(2, 2))(conv8), conv1], axis=1)
conv9 = Conv2D(64, (3, 3), activation='relu', padding='same')(up4)
conv10 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv9)

# here I used number classes = number of filters and softmax although
# not sure if a dense layer should be here instead
conv11 = Conv2D(4, (1, 1), activation='softmax')(conv10)

model = Model(inputs=[inputs], outputs=[conv11])

# here categorical cross entropy is being used but may not be correct
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

return model

Do you have any suggestions on how to modify the final portions of the model so this trains successfully? I get a variety of shape mismatch errors, and the few times I managed to make it run, the loss did not change throughout epochs.

like image 925
pepe Avatar asked May 10 '17 18:05

pepe


People also ask

What is multi class semantic segmentation?

Semantic segmentation is a computer vision task in which every pixel of a given image frame is classified/labelled based on whichever class it belongs to. Typically, Convolutional Neural Networks (CNNs) are used for image segmentation tasks.

How is semantic segmentation implemented?

In order to perform semantic segmentation, a higher level understanding of the image is required. The algorithm should figure out the objects present and also the pixels which correspond to the object. Semantic segmentation is one of the essential tasks for complete scene understanding.

What is Deeplab v3+?

DeepLabv3+ is a state-of-art deep learning model for semantic image segmentation [3], where the goal is to assign semantic labels (such as a person, a dog, a cat and so on) to every pixel in the input image.

What is SegNet?

SegNet is a semantic segmentation model. This core trainable segmentation architecture consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network.


2 Answers

You should have your target as (634,4,64,64) if you're using channels_first.
Or (634,64,64,4) if channels_last.

Each channel of your target should be one class. Each channel is an image of 0's and 1's, where 1 means that pixel is that class and 0 means that pixel is not that class.

Then, your target is 634 groups, each group containing four images, each image having 64x64 pixels, where pixels 1 indicate the presence of the desired feature.

I'm not sure the result will be ordered correctly, but you can try:

mask_train = to_categorical(mask_train, 4)
mask_train = mask_train.reshape((634,64,64,4)) 
#I chose channels last here because to_categorical is outputing your classes last: (2596864,4)

#moving the channel:
mask_train = np.moveaxis(mask_train,-1,1)

If the ordering doesn't work properly, you can do it manually:

newMask = np.zeros((634,4,64,64))

for samp in range(len(mask_train)):
    im = mask_train[samp,0]
    for x in range(len(im)):
        row = im[x]
        for y in range(len(row)):
            y_val = row[y]
            newMask[samp,y_val,x,y] = 1
like image 144
Daniel Möller Avatar answered Sep 20 '22 16:09

Daniel Möller


Bit late but you should try

mask_train = to_categorical(mask_train, num_classes=None)

That will result in (634, 4, 64, 64) for mask_train.shape and a binary mask for each individual class (one-hot encoded).

Last conv layer, activation and loss looks good for multiclass segmentation.

like image 35
Daniel Avatar answered Sep 17 '22 16:09

Daniel