Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding RLE (run-length encoding) mask with Tensorflow Datasets

I have been experimenting with tensorflow Datasets but I cannot figure out how to efficiently create RLE-masks. FYI, I am using data from the Airbus Ship Detection Challenge in Kaggle: https://www.kaggle.com/c/airbus-ship-detection/data

I know my RLE-decoding function works (borrowed) from one of the kernels:

def rle_decode(mask_rle, shape=(768, 768)):
'''
mask_rle: run-length as string formated (start length)
shape: (height,width) of array to return
Returns numpy array, 1 - mask, 0 - background
'''
if not isinstance(mask_rle, str):
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    return img.reshape(shape).T

s = mask_rle.split()
starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
starts -= 1
ends = starts + lengths
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
for lo, hi in zip(starts, ends):
    img[lo:hi] = 1
return img.reshape(shape).T

.... BUT it does not seem to play nicely with the pipeline:

list_ds = tf.data.Dataset.list_files(train_paths_abs)
ds = list_ds.map(parse_img)

With the following parse function, everything works fine:

def parse_img(file_path,new_size=[128,128]):    
    img_content = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img_content)
    img = tf.image.convert_image_dtype(img, tf.float32)    
    img = tf.image.resize(img,new_size)
    return img

But things go rogue if I include the mask:

def parse_img(file_path,new_size=[128,128]):
    
    # Image
    img_content = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img_content)
    img = tf.image.convert_image_dtype(img, tf.float32)    
    img = tf.image.resize(img,new_size)
    
    # Mask
    file_id = tf.strings.split(file_path,'/')[-1]
    objects = [rle_decode(m) for m in df2[df.ImageId==file_id]]
    mask = np.sum(objects,axis=0)
    mask = np.expand_dims(mask,3)   # Force mask to have 3 channels, necessary for resize step
    mask = tf.image.convert_image_dtype(mask, tf.int8)
    mask = tf.clip_by_value(mask,0,1)
    mask = tf.image.resize(mask,new_size)
    mask = tf.squeeze(mask)     # squeeze back
    mask = tf.image.convert_image_dtype(mask, tf.int8)
    
    return img, mask

Although my parse_img function works fine (I have checked it on a sample, it takes 271 µs ± 67.9 µs per run); the list_ds.map step takes forever (>5 minutes) before hanging. I can't figure out what's wrong and it drives me crazy! Any idea?

like image 915
Alex Avatar asked Nov 04 '19 12:11

Alex


People also ask

How do I unpack RLE?

The RLE decompression consists in browsing the message formed of pairs (character, number of repetition) and writing the equivalent text by writing the character the corresponding number of times.

What is RLE encoded mask?

RLE is run-length encoding. It is used to encode the location of foreground objects in segmentation. Instead of outputting a mask image, you give a list of start pixels and how many pixels after each of those starts is included in the mask.

What is the best case for RLE encoding?

The best case is when 128 identical characters follow each other, this is compressed into 2 bytes instead of 128 giving a compression ratio of 64. For this reason RLE is most often used to compress black and white or 8 bit indexed colour images where long runs are likely.

What is run length encoding in Python?

August 12, 2021. Run Length Encoding is a lossless data compression algorithm. It compresses data by reducing repetitive, and consecutive data called runs. It does so by storing the number of these runs followed by the data.


1 Answers

You can rewrite the function rle_decode with tensorflow like this (here I do not do the final transposition to keep it more general, but you can do it later):

import tensorflow as tf

def rle_decode_tf(mask_rle, shape):
    shape = tf.convert_to_tensor(shape, tf.int64)
    size = tf.math.reduce_prod(shape)
    # Split string
    s = tf.strings.split(mask_rle)
    s = tf.strings.to_number(s, tf.int64)
    # Get starts and lengths
    starts = s[::2] - 1
    lens = s[1::2]
    # Make ones to be scattered
    total_ones = tf.reduce_sum(lens)
    ones = tf.ones([total_ones], tf.uint8)
    # Make scattering indices
    r = tf.range(total_ones)
    lens_cum = tf.math.cumsum(lens)
    s = tf.searchsorted(lens_cum, r, 'right')
    idx = r + tf.gather(starts - tf.pad(lens_cum[:-1], [(1, 0)]), s)
    # Scatter ones into flattened mask
    mask_flat = tf.scatter_nd(tf.expand_dims(idx, 1), ones, [size])
    # Reshape into mask
    return tf.reshape(mask_flat, shape)

A small test (TensorFlow 2.0):

mask_rle = '1 2 4 3 9 4 15 5'
shape = [4, 6]
# Original NumPy function
print(rle_decode(mask_rle, shape))
# [[1 0 0 1]
#  [1 0 0 0]
#  [0 1 1 0]
#  [1 1 1 0]
#  [1 1 1 0]
#  [1 1 1 0]]
# TensorFlow function (transposing is done out of the function)
tf.print(tf.transpose(rle_decode_tf(mask_rle, shape)))
# [[1 0 0 1]
#  [1 0 0 0]
#  [0 1 1 0]
#  [1 1 1 0]
#  [1 1 1 0]
#  [1 1 1 0]]
like image 100
jdehesa Avatar answered Sep 17 '22 18:09

jdehesa