Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding how to use tf.dataset.map()

I am converting some code which originally used JPEGs as the input to use Matlab MAT files. The code contains the lines:

train_dataset = tf.data.Dataset.list_files(PATH + 'train/*.mat')
train_dataset = train_dataset.shuffle(BUFFER_SIZE) 
train_dataset = train_dataset.map(load_image_train)

If I loop through the dataset and print() each element before map(), I get a set of tensors with the file paths visible.

However, within the load_image_train function, this is not the case, the output of print() is:

Tensor("add:0", shape=(), dtype=string)

I would like to use the scipy.io.loadmat() function to get the data from my mat files but it fails because the path is a tensor and not a string. What does dataset.map() do that appears to make the literal string value no longer visible? How do I extract the string so I can use it as input for scipy.io.loadmat()?

Apologies if this is a stupid question, relatively new to Tensorflow and still trying to understand. A lot of discussion I can find of related issues only applies to TF v1. Thank you for any help!

like image 226
asher1213 Avatar asked Apr 02 '20 17:04

asher1213


People also ask

What does TF data dataset do?

An overview of tf. data. The Dataset API allows you to build an asynchronous, highly optimized data pipeline to prevent your GPU from data starvation. It loads data from the disk (images or text), applies optimized transformations, creates batches and sends it to the GPU.

What does TF data dataset From_tensor_slices do?

With that knowledge, from_tensors makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one ...

How do you get the shape of a TF dataset?

To get the shape of a tensor, you can easily use the tf. shape() function. This method will help the user to return the shape of the given tensor.

What is flat_map in TF Data Set?

flat_map method of tf.data.Dataset flattens the dataset and maps the function given in method argument across the dataset. Function provided in argument must return a dataset object. Lets understand working of flat_map with an example.

How to map the dataset through a 1-to-1 transform?

The tf.data.Dataset.map () function is used to map the dataset through a 1-to-1 transform. transform: A function mapping a dataset element to a transformed dataset element.

How to create an infinite dataset in TF?

Note that when supplieing any dataset you have to give the length, otherwise you get a ValueError: When providing an infinite dataset, you must specify the number of steps to run.message. # Create the tf.data.Dataset from the existing data dataset=tf.data. Dataset.from_tensor_slices((x_train,y_train))# Split the data into a train and a test set.

How to create a tensor dataset in TF Data?

The easiest way to begin and understand on how to create a tf.data.Dataset is to begin by creating a tensorflow dataset and the best place to start for it is tf.data.Dataset.from_tensor_slices () method. This method accepts numpy arrays/ python lists/ etc and converts them to tensor dataset.


1 Answers

In the below code, I am using tf.data.Dataset.list_files to read a file_path of a image. In the map function I am loading the image and doing the crop_central(basically crops the center part of the image for the given percentage, here I have specified the percentage by np.random.uniform(0.50, 1.00)).

As you rightly mentioned, it is difficult to read the file as the the file path is of tf.string type and the load_img or any other function to read the image file would require simple string type.

So here is how you can do it -

  1. You need to decorate your map function with tf.py_function(load_file_and_process, [x], [tf.float32]). You can find more about it here.
  2. You can retrieve the string from the tf.string using bytes.decode(path.numpy().

Below is the complete code for you reference. You can replace it with your image path while you run this code.

%tensorflow_version 2.x
import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array, array_to_img
from matplotlib import pyplot as plt
import numpy as np

def load_file_and_process(path):
    image = load_img(bytes.decode(path.numpy()), target_size=(224, 224))
    image = img_to_array(image)
    image = tf.image.central_crop(image, np.random.uniform(0.50, 1.00))
    return image

train_dataset = tf.data.Dataset.list_files('/content/bird.jpg')
train_dataset = train_dataset.map(lambda x: tf.py_function(load_file_and_process, [x], [tf.float32]))

for f in train_dataset:
  for l in f:
    image = np.array(array_to_img(l))
    plt.imshow(image)

Output -

enter image description here

Hope this answers your question. Happy Learning.

like image 142
Tensorflow Warrior Avatar answered Oct 22 '22 12:10

Tensorflow Warrior