I have a dict with "metadata" for my dataset, of sort
{'m1': array_1, 'm2': array_2, ...}.
Each of the arrays has shape (N, ...), where N is the number of samples.
The question:
Is it possible to create a tf.data.Dataset that outputs a dictionary {'meta_1': sub_array_1, 'meta_2': sub_array_2, ...}
for each iteration of the datasets iterator.get_next()? Here, sub_array_i should contain the ith metadata for one batch, so should have shape (batch_sz, ...).
What I tried so far is using tf.data.Dataset.from_generator(), like this:
N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N
def meta_dict_gen():
for i in range(num_samples):
ls = {}
for key, val in metadata.items():
ls[key] = val[i]
yield ls
dataset = tf.data.Dataset.from_generator(meta_dict_gen, output_types=(dict))
The problem with this seems to be in output_types=(dict)
. The code above throws at me a
TypeError: Expected DataType for argument 'Tout' not < class 'dict'>.
I'm using tensorflow 1.8 and python 3.6.
Stay organized with collections Save and categorize content based on your preferences. Follow this guide to create a new dataset (either in TFDS or in your own repository). Check our list of datasets to see if the dataset you want is already present.
load will return the tuple ( tf. data. Dataset , tfds.
Dataset. from_tensor_slices() method, we can get the slices of an array in the form of objects by using tf. data.
So actually it is possible to do what you intend, you just have to be specific about the contents of the dict:
import tensorflow as tf
import numpy as np
N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N
def meta_dict_gen():
for i in range(num_samples):
ls = {}
for key, val in metadata.items():
ls[key] = val[i]
yield ls
dataset = tf.data.Dataset.from_generator(
meta_dict_gen,
output_types={k: tf.float32 for k in metadata},
output_shapes={'m1': (2,), 'm2': (3, 5)})
iter = dataset.make_one_shot_iterator()
next_elem = iter.get_next()
print(next_elem)
Output:
{'m1': <tf.Tensor 'IteratorGetNext:0' shape=(2,) dtype=float32>,
'm2': <tf.Tensor 'IteratorGetNext:1' shape=(3, 5) dtype=float32>}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With