Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cast tensorflow 2.0 BatchDataset to numpy array

I have this code:

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)

print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

And I want to cast these two BatchDataset variables to numpy arrays, can I do it easily? I am using TF 2.0, but I just found code to cast tf.data with TF 1.0

like image 844
mhery Avatar asked Sep 04 '19 15:09

mhery


People also ask

Can you convert a tensor to NumPy array?

To convert back from tensor to numpy array you can simply run . eval() on the transformed tensor.

Does TensorFlow use NumPy arrays?

TensorFlow implements a subset of the NumPy API, available as tf. experimental. numpy . This allows running NumPy code, accelerated by TensorFlow, while also allowing access to all of TensorFlow's APIs.

How do I iterate over a TensorFlow dataset?

To iterate over the dataset several times, use . repeat() . We can enumerate each batch by using either Python's enumerator or a build-in method. The former produces a tensor, which is recommended.


1 Answers

After batching of dataset, the shape of last batch may not be same with that of rest of the batches. For example, if there are totally 100 elements in your dataset and you batch with size of 6, the last batch will have size of only 4. (100 = 6 * 16 + 4).

So, in such cases, you will not be able to transform your dataset into numpy straight forward. For that reason, you will have to use drop_remainder parameter to True in batch method. It will drop the last batch if it is not correctly sized.

After that, I have enclosed the code on how to convert dataset to Numpy.

import tensorflow as tf
import numpy as np

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

TRAIN_BUF=1000
BATCH_SIZE=64

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)

# print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

train_np = np.stack(list(train_dataset))
test_np = np.stack(list(test_dataset))
print(type(train_np), train_np.shape)
print(type(test_np), test_np.shape)

Output:

<class 'numpy.ndarray'> (937, 64, 28, 28)
<class 'numpy.ndarray'> (156, 64, 28, 28)
like image 144
Prasad Avatar answered Sep 22 '22 14:09

Prasad