I have two sets of files: masks and images. There is no tiff decoder in 'tensorflow', but there is 'tfio.experimental'. Tiff files have more than 4 channels.
this code doesnt work:
    import numpy as np
    import tiffile as tiff
    import tensorflow as tf
    for i in range(100):
      a = np.random.random((30, 30, 8))
      b = np.random.randint(10, size = (30, 30, 8))
      tiff.imsave('new1//images'+str(i)+'.tif', a)
      tiff.imsave('new2//images'+str(i)+'.tif', b)
    import glob
    paths1 = glob.glob('new1//*.*')
    paths2 = glob.glob('new2//*.*')
    def load(image_file, mask_file):
      image = tf.io.read_file(image_file)
      image = tfio.experimental.image.decode_tiff(image)
      mask = tf.io.read_file(mask_file)
      mask = tfio.experimental.image.decode_tiff(mask)
      input_image = tf.cast(image, tf.float32)
      mask_image = tf.cast(mask, tf.uint8)
      return input_image, mask_image
    AUTO = tf.data.experimental.AUTOTUNE
    BATCH_SIZE = 32
    dataloader = tf.data.Dataset.from_tensor_slices((paths1, paths2))
    dataloader = (
    dataloader
    .shuffle(1024)
    .map(load, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
    )
it is impossible to keep entire dataset in the memory, saving to numpy arrays also gives no easy solution. Although code provided above gives no error directly. But shape of images is (None, None, None)
'model.fit' gives error
Is there alternative way to save arrays? I only see bruteforce solution with manual feeding random batches during custom training.
I found solution for my question: DataGenerator allows to work with any files
class Gen(tf.keras.utils.Sequence):
    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size
    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)
    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) *
        self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) *
        self.batch_size]
        return np.array([
            tiff.imread(file_name_x)
               for file_name_x in batch_x]), np.array([
            tiff.imread(file_name_y)
               for file_name_y in batch_y])
It works anyway without any problem
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With