Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load Image Masks (Labels) for Image Segmentation in Keras

I am using Tensorflow as a backend to Keras and I am trying to understand how to bring in my labels for image segmentation training.

I am using the LFW Parts Dataset which has both the ground truth image and the ground truth mask which looks like this * 1500 training images:

Aaron_Peirsol_0002_Image Aaron_Peirsol_0002_Mask

As I understand the process, during training, I load both the

  • (X) Image
  • (Y) Mask Image

Doing this in batches to meet my needs. Now my question is, is it sufficient to just load them both (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process/reshape the Mask image in some way. Effectively, the mask/labels are represented as [R, G, B] pixels where:

  • [255, 0, 0] Hair
  • [0, 255, 0] Face
  • [0, 0, 255] Background

I could do something like this to normalize it to 0-1, I don't know if I should though:

im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)

so I end up with:

  • [1, 0, 0] Hair
  • [0, 1, 0] Face
  • [0, 0, 1] Background

Everything I found online uses existing datasets in tensorflow or keras. Nothing is really all that clear on how to pull things off if you have what could be a considered a custom dataset.

I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ

And they advocate for converting the mask images to a (H, W, 1) (HWC) ?where my classes would be 0, 1 ,2 for Background, Hair, and Face respectively.

It may be that this is a duplicate here (combination of similar quesiton/answers):

How to implement multi-class semantic segmentation?

Tensorflow: How to create a Pascal VOC style image

I found one example that processes PascalVOC into (N, N, 1) that I adapted:

LFW_PARTS_PALETTE = {
    (0, 0, 255) : 0 , # background (blue)
    (255, 0, 0) : 1 , # hair (red)
    (0, 0, 255) : 2 , # face (green)
}

def convert_from_color_segmentation(arr_3d):
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
    palette = LFW_PARTS_PALETTE

    for i in range(0, arr_3d.shape[0]):
        for j in range(0, arr_3d.shape[1]):
            key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
            arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0

    return arr_2d

I think this might be close to what I want but not spot on. I think I need it to be (N, N, 3) since I have 3 classes? The above version and there is another one originated from these 2 locations:

https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50

https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link one-hot's the values)

like image 929
AJ Venturella Avatar asked Jul 18 '17 23:07

AJ Venturella


People also ask

What is the input dimension of Keras label?

Keras requires the label to be one-hot encoded. So your input will have to be of (N x N x n_classes) dimension. Show activity on this post.

What is image segmentation in machine learning?

To accomplish this, we need to segment the image, i.e., classify each pixel of the image to the object it belongs to or give each pixel of the image a label contrary to giving one label to an image. Thus, image segmentation is the task of learning a pixel-wise mask for each object in the image.

Does keras support semantic segmentation with a UNET like architecture?

This report explores semantic segmentation with a UNET like architecture in Keras and interactively visualizes the model’s prediction in Weights & Biases. View interactive report here.

What is semanticlogger in keras?

Our SemanticLogger is a custom Keras callback. We can pass it to model.fit to log our model's predictions on a small validation set. Weights and Biases will automatically overlay the mask on the image. We will shortly look at the results.


1 Answers

Since this is semantic segmentation, you are classifying each pixel in the image, so you would be using a cross-entropy loss most likely. Keras, as well as TensorFlow require that your mask is one hot encoded, and also, the output dimension of your mask should be something like [batch, height, width, num_classes] <- which you will have to reshape the same way as your mask before computing your cross-entropy mask, which essentially means that you would have to reshape your logits and mask to the tensor shape [-1, num_classes] where -1 denotes 'as many as required'.

Have a look here at the end

Since your question is about loading your own image, I just finished building an input pipeline for segmentation myself, it is in TensorFlow though, so I don't know if it helps you, have a look if you are interested: Tensorflow input pipeline for segmentation

like image 135
Hasnain Raza Avatar answered Nov 10 '22 01:11

Hasnain Raza