Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a tensorflow dataset that outputs a dict

I have a dict with "metadata" for my dataset, of sort {'m1': array_1, 'm2': array_2, ...}. Each of the arrays has shape (N, ...), where N is the number of samples.

The question: Is it possible to create a tf.data.Dataset that outputs a dictionary {'meta_1': sub_array_1, 'meta_2': sub_array_2, ...} for each iteration of the datasets iterator.get_next()? Here, sub_array_i should contain the ith metadata for one batch, so should have shape (batch_sz, ...).

What I tried so far is using tf.data.Dataset.from_generator(), like this:

N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))} 
num_samples = N

def meta_dict_gen():
    for i in range(num_samples):
        ls = {}
        for key, val in metadata.items():
            ls[key] = val[i]
        yield ls

dataset = tf.data.Dataset.from_generator(meta_dict_gen, output_types=(dict))

The problem with this seems to be in output_types=(dict). The code above throws at me a

TypeError: Expected DataType for argument 'Tout' not < class 'dict'>.


I'm using tensorflow 1.8 and python 3.6.

like image 648
dasWesen Avatar asked Jul 02 '18 13:07

dasWesen


People also ask

How do I create a custom dataset for TensorFlow?

Stay organized with collections Save and categorize content based on your preferences. Follow this guide to create a new dataset (either in TFDS or in your own repository). Check our list of datasets to see if the dataset you want is already present.

Which data type is returned by TensorFlow Datasets?

load will return the tuple ( tf. data. Dataset , tfds.

What does TF data dataset From_tensor_slices do?

Dataset. from_tensor_slices() method, we can get the slices of an array in the form of objects by using tf. data.


1 Answers

So actually it is possible to do what you intend, you just have to be specific about the contents of the dict:

import tensorflow as tf
import numpy as np

N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N

def meta_dict_gen():
    for i in range(num_samples):
        ls = {}
        for key, val in metadata.items():
            ls[key] = val[i]
        yield ls

dataset = tf.data.Dataset.from_generator(
    meta_dict_gen,
    output_types={k: tf.float32 for k in metadata},
    output_shapes={'m1': (2,), 'm2': (3, 5)})
iter = dataset.make_one_shot_iterator()
next_elem = iter.get_next()
print(next_elem)

Output:

{'m1': <tf.Tensor 'IteratorGetNext:0' shape=(2,) dtype=float32>,
 'm2': <tf.Tensor 'IteratorGetNext:1' shape=(3, 5) dtype=float32>}
like image 99
jdehesa Avatar answered Nov 05 '22 00:11

jdehesa