Ground truth pixel labels in PASCAL VOC for semantic segmentation

Question

I'm experimenting with FCN(Fully Convolutional Network), and trying to reproduce the results reported in the original paper (Long et al. CVPR'15).

In that paper the authors reported results on PASCAL VOC dataset. After downloading and untarring the train-val dataset for 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar ), I noticed there are 2913 png files in the SegmentationClass and same number of files in SegmentationObject subdirectory.

The pixel values in these png files seem to be multiples of 32 (e.g. 0, 128, 192, 224...), which don't fall in the range between 0 and 20. I'm just wondering what's the correspondence between the pixel values and ground truth labels for pixels. Or am I looking at the wrong files?

mkisantal · Accepted Answer

Just downloaded Pascal VOC. The pixel values in the dataset are as follows:

0: background
[1 .. 20] interval: segmented objects, classes [Aeroplane, ..., Tvmonitor]
255: void category, used for border regions (5px) and to mask difficult objects

You can find more info on the dataset here.

The previous answer by captainist discusses png files saved with color palettes, I think it's not related to the original question. The linked tensorflow code simply loads a png that was saved with color map (palette), then converts it to numpy array (at this step the color palette is lost), then saves the array as a png again. The numerical values are not changed in this process, only the color palette is removed.

captainst · Answer

I know that this question was asked some time ago. But I raised myself a similar question when trying on PASCAL VOC 2012 with tensorflow deeplab.

If you look at the file_download_and_convert_voc2012.sh, there are lines marked by "# Remove the colormap in the ground truth annotations". This part process the original SegmentationClass files and produce the raw segmented image files, which have each pixel value between 0 : 20. (If you may ask why, check this post: Python: Use PIL to load png file gives strange results)

Pay attention to this magic function:

def _remove_colormap(filename):
  """Removes the color map from the annotation.

  Args:
    filename: Ground truth annotation filename.

  Returns:
    Annotation without color map.
  """
  return np.array(Image.open(filename))

I have to admit that I do not fully understand the operation by

np.array(Image.open(filename))

I have shown here below a set of images for your referece (from above down: orignal image, segmentation class, and segmentation raw class)

enter image description here

hiankun · Answer

The values mentioned in the original question look like the "color map" values, which could be obtained by getpalette() function from PIL Image module.

For the annotated values of the VOC images, I use the following code snip to check them:

import numpy as np
from PIL import Image

files = [ 
        'SegmentationObject/2007_000129.png',
        'SegmentationClass/2007_000129.png',
        'SegmentationClassRaw/2007_000129.png', # processed by _remove_colormap()
                                                # in captainst's answer...
        ]

for f in files:
    img = Image.open(f)
    annotation = np.array(img)
    print('
file: {}
anno: {}
img info: {}'.format(
        f, set(annotation.flatten()), img))

The three images used in the code are shown below (left to right, respectively):

enter image description here

The corresponding outputs of the code are as follows:

file: SegmentationObject/2007_000129.png
anno: {0, 1, 2, 3, 4, 5, 6, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F59538B35F8>

file: SegmentationClass/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F5930DD5780>

file: SegmentationClassRaw/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=L size=334x500 at 0x7F5930DD52E8>

There are two things I learned from the above output.

First, the annotation values of the images in SegmentationObject folder are assigned by the number of objects. In this case there are 3 people and 3 bicycles, and the annotated values are from 1 to 6. However, for images in SegmentationClass folder, their values are assigned by the class value of the objects. All the people belong to class 15 and all the bicycles are class 2.

Second, as mkisantal has already mentioned, after the np.array() operation, the color palette was removed (I "know" it by observing the results but I still don't understand the mechanism under the hood...). We can confirm this by checking the image mode of the outputs:

Both the SegmentationObject/2007_000129.png and SegmentationClass/2007_000129.png have image mode=P while
SegmentationClassRaw/2007_000129.png has image mode=L. (ref: The modes of PIL Image)

Ground truth pixel labels in PASCAL VOC for semantic segmentation

Tags:

semantic-segmentation

cccjjj

3 Answers

mkisantal

captainst

hiankun

Recent Activity

Donate For Us

Ground truth pixel labels in PASCAL VOC for semantic segmentation

Tags:

semantic-segmentation

cccjjj

3 Answers

mkisantal

captainst

hiankun

Related questions

Recent Activity

Donate For Us