cast tensorflow 2.0 BatchDataset to numpy array

Tags:

I have this code:

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)

print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

And I want to cast these two BatchDataset variables to numpy arrays, can I do it easily? I am using TF 2.0, but I just found code to cast tf.data with TF 1.0

844

asked Sep 04 '19 15:09

mhery

1 Answers

After batching of dataset, the shape of last batch may not be same with that of rest of the batches. For example, if there are totally 100 elements in your dataset and you batch with size of 6, the last batch will have size of only 4. (100 = 6 * 16 + 4).

So, in such cases, you will not be able to transform your dataset into numpy straight forward. For that reason, you will have to use drop_remainder parameter to True in batch method. It will drop the last batch if it is not correctly sized.

After that, I have enclosed the code on how to convert dataset to Numpy.

import tensorflow as tf
import numpy as np

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

TRAIN_BUF=1000
BATCH_SIZE=64

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).
                          shuffle(TRAIN_BUF).batch(BATCH_SIZE, drop_remainder=True)

# print(train_dataset, type(train_dataset), test_dataset, type(test_dataset))

train_np = np.stack(list(train_dataset))
test_np = np.stack(list(test_dataset))
print(type(train_np), train_np.shape)
print(type(test_np), test_np.shape)

Output:

<class 'numpy.ndarray'> (937, 64, 28, 28)
<class 'numpy.ndarray'> (156, 64, 28, 28)

144

answered Sep 22 '22 14:09

Prasad

Related questions
                            
                                How to paste in a specific place with Python PIL?
                            
                                ValueError: The model is not configured to compute accuracy
                            
                                Automating database creation for testing
                            
                                How to find nearest divisor to given value with modulo zero
                            
                                Logging DEBUG logs are not shown when executing the Python Azure Functions
                            
                                Pandas - substring each row with a different length
                            
                                ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL:
                            
                                AWS Lambda - SQS Integration with Exponential Backoff
                            
                                How to join many fragmented time series in one regular Pandas DataFrame in Python
                            
                                How to fix Tkinter? Every code with GUI crashes mac os with respring
                            
                                How to provide an async function in PythonOperator's python_callable in Airflow?
                            
                                Sending over the same socket with multiprocessing.pool.map
                            
                                Break up a list of strings in a pandas dataframe column into new columns based on first word of each sentence
                            
                                what are count0, count1 and count2 values returned by the Python gc.get_count()
                            
                                HTTP/2 requests and headers starting with colon
                            
                                Simple data operations: R vs python
                            
                                pandas: How to keep the last `n` records of each group sorted by another variable?
                            
                                scipy UnivariateSpline fails with multivalued X
                            
                                How to cancel a pending wait_for
                            
                                How to improve the quality of the audio of RTMP stream after multiplexing two streams

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

cast tensorflow 2.0 BatchDataset to numpy array

Tags:

python

casting

tensorflow

mhery

People also ask

1 Answers

Prasad

Recent Activity

Donate For Us