Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is TensorFlow.Data.Dataset the same as DatasetV1Adapter?

When I use:

training_ds = tf.data.Dataset.from_generator(SomeTrainingDirectoryIterator, (tf.float32, tf.float32))

I expect for it to return a Tensorflow Dataset, but instead, training_ds is a DatasetV1Adapter object. Are they essentially the same thing? If not could I convert the DatasetV1Adapter to a Tf.Data.Dataset object?

Also, what is the best way to view loop over and view my dataset? If I were to call:

def show_batch(dataset):
    for batch, head in dataset.take(1):
        for labels, value in batch.items():
            print("{:20s}: {}".format(labels, value.numpy()))

With training_ds as my dataset, I am thrown this error:

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'items'

UPDATE: I upgraded my TensorFlow version from 1.14 to 2.0. and now the Dataset is of a FlatMapDataset. But this is still not my expected return object, why am I not being returned a regular tf.data.Dataset?

like image 564
theMoreYouNgo Avatar asked Feb 19 '20 23:02

theMoreYouNgo


1 Answers

If you're using Tensorflow 2.0 (or below) from_generator will give you DatasetV1Adapter. For the Tensorflow version greater than 2.0 from_generator will give you FlatMapDataset.

The error you are facing is not related to the type of dataset from_generator returns, but with the way you are printing the dataset. batch.items() works if the from_generator is generating the data of <class 'dict'> type.

Example 1 - Here I am using from_generator to create <class 'tuple'> type data. So If I print using batch.items(), then it throws the error you are facing. You can simply use list(dataset.as_numpy_iterator()) to print the dataset OR dataset.take(1).as_numpy_iterator() to print required number of records, here as it is take(1), it prints just one record. Have added print statements in the code to explain better. You can find details in the Output.

import tensorflow as tf
print(tf.__version__)
import itertools

def gen():
  for i in itertools.count(1):
    yield (i, [1] * i)

dataset = tf.data.Dataset.from_generator(
     gen,
     (tf.int64, tf.int64),
     (tf.TensorShape([]), tf.TensorShape([None])))

print("tf.data.Dataset type is:",dataset,"\n")

for batch in dataset.take(1):
  print("My type is of:",type(batch),"\n")

# This Works
print("Lets print just the first row in dataset :","\n",list(dataset.take(1).as_numpy_iterator()),"\n")

# This won't work because we have not created dict 
print("Lets print using the batch.items() :")
for batch in dataset.take(1):
  for m1,m2 in batch.items():
      print("{:20s}: {}".format(m1, m2))

Output -

2.2.0
tf.data.Dataset type is: <FlatMapDataset shapes: ((), (None,)), types: (tf.int64, tf.int64)> 

My type is of: <class 'tuple'> 

Lets print just the first row in dataset : 
 [(1, array([1]))] 

Lets print using the batch.items() :
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-27bbc2c21d24> in <module>()
     24 print("Lets print using the batch.items() :")
     25 for batch in dataset.take(1):
---> 26   for m1,m2 in batch.items():
     27       print("{:20s}: {}".format(m1, m2))

AttributeError: 'tuple' object has no attribute 'items'

Example 2 - Here I am using from_generator to create <class 'dict'> type data. So If I print using batch.items(), then it works without any issues. Being said that, you can simply use list(dataset.as_numpy_iterator()) to print the dataset. Have added print statements in the code to explain better. You can find details in the Output.

import tensorflow as tf

N = 100
# dictionary of arrays:
metadata = {'m1': tf.zeros(shape=(N,2)), 'm2': tf.ones(shape=(N,3,5))}
num_samples = N

def meta_dict_gen():
    for i in range(num_samples):
        ls = {}
        for key, val in metadata.items():
            ls[key] = val[i]
        yield ls

dataset = tf.data.Dataset.from_generator(
    meta_dict_gen,
    output_types={k: tf.float32 for k in metadata},
    output_shapes={'m1': (2,), 'm2': (3, 5)})

print("tf.data.Dataset type is:",dataset,"\n")

for batch in dataset.take(1):
  print("My type is of:",type(batch),"\n")

print("Lets print just the first row in dataset :","\n",list(dataset.take(1).as_numpy_iterator()),"\n")

print("Lets print using the batch.items() :")
for batch in dataset.take(1):
  for m1, m2 in batch.items():
    print("{:2s}: {}".format(m1, m2))

Output -

tf.data.Dataset type is: <FlatMapDataset shapes: {m1: (2,), m2: (3, 5)}, types: {m1: tf.float32, m2: tf.float32}> 

My type is of: <class 'dict'> 

Lets print just the first row in dataset : 
 [{'m1': array([0., 0.], dtype=float32), 'm2': array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]], dtype=float32)}] 

Lets print using the batch.items() :
m1: [0. 0.]
m2: [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Hope this answers your question. Happy Learning.

like image 167
Tensorflow Warrior Avatar answered Oct 03 '22 21:10

Tensorflow Warrior