Recently I am looking into the dataset API in Tensorflow, and there is a method <code>dataset.shard()</code> which is for distributed computations. This is what's stated in Tensorflow's documentation: <pre class="prettyprint"><code>Creates a Dataset that includes only 1/num_shards of this dataset. d = tf.data.TFRecordDataset(FLAGS.input_file) d = d.shard(FLAGS.num_workers, FLAGS.worker_index) d = d.repeat(FLAGS.num_epochs) d = d.shuffle(FLAGS.shuffle_buffer_size) d = d.map(parser_fn, num_parallel_calls=FLAGS.num_map_threads) </code></pre> This method is said to return a portion of the original dataset. If I have two workers, am I supposed to do: <pre class="prettyprint"><code>d_0 = d.shard(FLAGS.num_workers, worker_0) d_1 = d.shard(FLAGS.num_workers, worker_1) ...... iterator_0 = d_0.make_initializable_iterator() iterator_1 = d_1.make_initializable_iterator() for worker_id in workers: with tf.device(worker_id): if worker_id == 0: data = iterator_0.get_next() else: data = iterator_1.get_next() ...... </code></pre> Because the documentation did not specify how to make subsequent calls, I am a bit confused here. Thanks!

You should take a look at the tutorial on Distributed TensorFlow first to better understand how it works. You have multiple workers, that each run the same code but with a small difference: each worker will have a different <code>FLAGS.worker_index</code>. When you use <code>tf.data.Dataset.shard</code>, you will supply this worker index and the data will be split between workers equally. Here is an example with 3 workers. <pre class="prettyprint lang-py prettyprint-override"><code>dataset = tf.data.Dataset.range(6) dataset = dataset.shard(FLAGS.num_workers, FLAGS.worker_index) iterator = dataset.make_one_shot_iterator() res = iterator.get_next() # Suppose you have 3 workers in total with tf.Session() as sess: for i in range(2): print(sess.run(res)) </code></pre> We will have the output: <ul> <li> <code>0, 3</code> on worker 0</li> <li> <code>1, 4</code> on worker 1</li> <li> <code>2, 5</code> on worker 2</li> </ul>

How to use dataset.shard in tensorflow?

Tags:

tensorflow

tensorflow-datasets

Recently I am looking into the dataset API in Tensorflow, and there is a method dataset.shard() which is for distributed computations.

This is what's stated in Tensorflow's documentation:

Creates a Dataset that includes only 1/num_shards of this dataset.

d = tf.data.TFRecordDataset(FLAGS.input_file)
d = d.shard(FLAGS.num_workers, FLAGS.worker_index)
d = d.repeat(FLAGS.num_epochs)
d = d.shuffle(FLAGS.shuffle_buffer_size)
d = d.map(parser_fn, num_parallel_calls=FLAGS.num_map_threads)

This method is said to return a portion of the original dataset. If I have two workers, am I supposed to do:

d_0 = d.shard(FLAGS.num_workers, worker_0)
d_1 = d.shard(FLAGS.num_workers, worker_1)
......
iterator_0 = d_0.make_initializable_iterator()
iterator_1 = d_1.make_initializable_iterator()

for worker_id in workers:
    with tf.device(worker_id):
        if worker_id == 0:
            data = iterator_0.get_next()
        else:
            data = iterator_1.get_next()
        ......

Because the documentation did not specify how to make subsequent calls, I am a bit confused here.

Thanks!

826

asked Feb 13 '18 13:02

Jiang Wenbo

1 Answers

You should take a look at the tutorial on Distributed TensorFlow first to better understand how it works.

You have multiple workers, that each run the same code but with a small difference: each worker will have a different FLAGS.worker_index.

When you use tf.data.Dataset.shard, you will supply this worker index and the data will be split between workers equally.

Here is an example with 3 workers.

dataset = tf.data.Dataset.range(6)
dataset = dataset.shard(FLAGS.num_workers, FLAGS.worker_index)


iterator = dataset.make_one_shot_iterator()
res = iterator.get_next()

# Suppose you have 3 workers in total
with tf.Session() as sess:
    for i in range(2):
        print(sess.run(res))

We will have the output:

0, 3 on worker 0
1, 4 on worker 1
2, 5 on worker 2

197

answered Oct 12 '22 12:10

Olivier Moindrot

Related questions
                            
                                Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?
                            
                                Keras LSTM Multiple Input Multiple Output
                            
                                How to make the tensorflow hub embeddings servable using tensorflow serving?
                            
                                Error: from tensorflow.examples.tutorials.mnist import input_data
                            
                                Keras/Tensorflow: Combined Loss function for single output
                            
                                How is the smooth dice loss differentiable?
                            
                                Tensorflow keras with tf dataset input
                            
                                How do you install modules within sagemaker training jobs?
                            
                                Calculate recall for each class after each epoch in Tensorflow 2
                            
                                Placeholder_2:0 is both fed and fetched
                            
                                Adding an extra hidden layer using Google's TensorFlow
                            
                                Tensorflow - ValueError: Shape must be rank 1 but is rank 0 for 'ParseExample/ParseExample'
                            
                                How to Argsort in Tensorflow?
                            
                                Tensorflow: How to index a tensor using 2D-index like in numpy
                            
                                How to get weights from tensorflow fully_connected
                            
                                Principle of setting 'hash_bucket_size' parameter?
                            
                                TensorFlow average gradients over several batches
                            
                                What to do when Seq2Seq network repeats words over and over in output?
                            
                                Cannot run tensorflow on GPU
                            
                                mAP decreasing with training tensorflow object detection SSD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With