I wish to write a function in TensorFlow 2.0 than shuffles data and their target labels before each training iteration.
Let's say I have two numpy datasets, X and y, representing data and labels for classification. How can I shuffle them at the same time?
Using sklearn
it's pretty easy:
from sklearn.utils import shuffle
X, y = shuffle(X, y)
How can I do the same in TensorFlow 2.0 ? The only tool I found in the documentation is tf.random.shuffle, but it takes only one object at a time, I need to feed two.
The tf. data. Dataset. shuffle() method randomly shuffles a tensor along its first dimension.
One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.
Just use the same 'seed' keyword parameter value, say seed=8 in function tf. random_shuffle for both labels and data.
Instead of shuffling x and y , its much easier to shuffle their indices, so first generate a list of indices
indices = tf.range(start=0, limit=tf.shape(x_data)[0], dtype=tf.int32)
then shuffle these indices
idx = tf.random.shuffle(indices)
and use these indices to shuffle the data
x_data = tf.gather(x_data, idx)
y_data = tf.gather(y_data, idx)
and youll have shuffled data
If you just want to shuffle two arrays in the same way, you can do:
import tensorflow as tf
# Assuming X and y are initially NumPy arrays
X = tf.convert_to_tensor(X)
y = tf.convert_to_tensor(y)
# Make random permutation
perm = tf.random.shuffle(tf.range(tf.shape(X)[0]))
# Reorder according to permutation
X = tf.gather(X, perm, axis=0)
y = tf.gather(y, perm, axis=0)
However, you may consider using a tf.data.Dataset
, which already provides a shuffle
method.
import tensorflow as tf
# You may use a placeholder if in graph mode
# (see https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays)
ds = tf.data.Dataset.from_tensor_slices((X, y))
# Shuffle with some buffer size (len(X) will use a buffer as big as X)
ds = ds.shuffle(buffer_size=len(X))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With