Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to shuffle two numpy datasets using TensorFlow 2.0?

I wish to write a function in TensorFlow 2.0 than shuffles data and their target labels before each training iteration.

Let's say I have two numpy datasets, X and y, representing data and labels for classification. How can I shuffle them at the same time?

Using sklearn it's pretty easy:

from sklearn.utils import shuffle
X, y = shuffle(X, y)

How can I do the same in TensorFlow 2.0 ? The only tool I found in the documentation is tf.random.shuffle, but it takes only one object at a time, I need to feed two.

like image 450
Leevo Avatar asked Sep 18 '19 10:09

Leevo


People also ask

Does Tensorflow shuffle?

The tf. data. Dataset. shuffle() method randomly shuffles a tensor along its first dimension.

How do I shuffle my dataset?

One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.

How do I shuffle images in Tensorflow?

Just use the same 'seed' keyword parameter value, say seed=8 in function tf. random_shuffle for both labels and data.


2 Answers

Instead of shuffling x and y , its much easier to shuffle their indices, so first generate a list of indices

indices = tf.range(start=0, limit=tf.shape(x_data)[0], dtype=tf.int32)

then shuffle these indices

idx = tf.random.shuffle(indices)

and use these indices to shuffle the data

x_data = tf.gather(x_data, idx)
y_data = tf.gather(y_data, idx)

and youll have shuffled data

like image 50
Imtinan Azhar Avatar answered Oct 10 '22 00:10

Imtinan Azhar


If you just want to shuffle two arrays in the same way, you can do:

import tensorflow as tf

# Assuming X and y are initially NumPy arrays
X = tf.convert_to_tensor(X)
y = tf.convert_to_tensor(y)
# Make random permutation
perm = tf.random.shuffle(tf.range(tf.shape(X)[0]))
# Reorder according to permutation
X = tf.gather(X, perm, axis=0)
y = tf.gather(y, perm, axis=0)

However, you may consider using a tf.data.Dataset, which already provides a shuffle method.

import tensorflow as tf

# You may use a placeholder if in graph mode
# (see https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays)
ds = tf.data.Dataset.from_tensor_slices((X, y))
# Shuffle with some buffer size (len(X) will use a buffer as big as X)
ds = ds.shuffle(buffer_size=len(X))
like image 26
jdehesa Avatar answered Oct 10 '22 00:10

jdehesa