Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the tf.data.Dataset support to generate dictionary structure?

The following is a piece of code from [https://www.tensorflow.org/programmers_guide/datasets]. In this example, the map function is a user-defined function to read the data. And in the map function, we need to set the output types are [tf.uint8, label.dtype].

import cv2

# Use a custom OpenCV function to read the image, instead of the standard
# TensorFlow `tf.read_file()` operation.
def _read_py_function(filename, label):
  image_decoded = cv2.imread(image_string, cv2.IMREAD_GRAYSCALE)
  return image_decoded, label

# Use standard TensorFlow operations to resize the image to a fixed shape.
def _resize_function(image_decoded, label):
  image_decoded.set_shape([None, None, None])
  image_resized = tf.image.resize_images(image_decoded, [28, 28])
  return image_resized, label

  filenames = ["/var/data/image1.jpg", "/var/data/image2.jpg", ...]
  labels = [0, 37, 29, 1, ...]

dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(
  lambda filename, label: tuple(tf.py_func(
    _read_py_function, [filename, label], [tf.uint8, label.dtype])))
dataset = dataset.map(_resize_function)

My question is, if we want to the _read_py_function() output a Python dictionary, then how do we set the outptu types? Is there an inherit data type such as tf.dict? For example:

def _read_py_function(filename):
  image_filename = filename[0]
  label_filename = filename[1]
  image_id = filename[2]
  image_age = filename[3]
  image_decoded = cv2.imread(image_filename, cv2.IMREAD_GRAYSCALE)
  image_decoded = cv2.imread(label_fielname, cv2.IMREAD_GRAYSCALE)
  return {'image':image_decoded, 'label':label_decoded, 'id':image_id, 'age':image_age}

Then, how do we design the dataset.map() function?

like image 494
mining Avatar asked Jan 03 '23 22:01

mining


1 Answers

Returning dicts inside the function called by tf.data.Dataset.map should work as expected.

Here is an example:

dataset = tf.data.Dataset.range(10)
dataset = dataset.map(lambda x: {'a': x, 'b': 2 * x})
dataset = dataset.map(lambda y: y['a'] + y['b'])

res = dataset.make_one_shot_iterator().get_next()

with tf.Session() as sess:
    for i in range(10):
        assert sess.run(res) == 3 * i
like image 147
Olivier Moindrot Avatar answered Jan 13 '23 12:01

Olivier Moindrot