How to change batch size dynamically in Tensorflow 2.0 Dataset?

Tags:

In TensorFlow 1.X you could change the batch size dynamically using a placeholder. eg

dataset.batch(batch_size=tf.placeholder())
See full example

How do you do it in TensorFlow 2.0?

I have tried the following but it doesn't work.

import numpy as np
import tensorflow as tf


def new_gen_function():
    for i in range(100):
        yield np.ones(2).astype(np.float32)


batch_size = tf.Variable(5, trainable=False, dtype=tf.int64)
train_ds = tf.data.Dataset.from_generator(new_gen_function, output_types=(tf.float32)).batch(
    batch_size=batch_size)

for data in train_ds:
    print(data.shape[0])
    batch_size.assign(10)
    print(batch_size)

Output

5
<tf.Variable 'Variable:0' shape=() dtype=int64, numpy=10>
5
<tf.Variable 'Variable:0' shape=() dtype=int64, numpy=10>
5
<tf.Variable 'Variable:0' shape=() dtype=int64, numpy=10>
5
...
...

I am training a model using a custom training loop using Gradient tape. How can I achieve this?

663

asked Dec 17 '19 06:12

Himaprasoon

2 Answers

I don't think you can the way you used to in TF1.

A work-around could be to build the batch yourself by stacking individual samples:

import tensorflow as tf

ds = tf.data.Dataset.range(10).repeat()
iterator = iter(ds)
for batch_size in range(1, 10):
  batch = tf.stack([iterator.next() for _ in range(batch_size)], axis=0)
  print(batch)

# tf.Tensor([0], shape=(1,), dtype=int64)
# tf.Tensor([1 2], shape=(2,), dtype=int64)
# tf.Tensor([3 4 5], shape=(3,), dtype=int64)
# tf.Tensor([6 7 8 9], shape=(4,), dtype=int64)
# tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
# tf.Tensor([5 6 7 8 9 0], shape=(6,), dtype=int64)
# tf.Tensor([1 2 3 4 5 6 7], shape=(7,), dtype=int64)
# tf.Tensor([8 9 0 1 2 3 4 5], shape=(8,), dtype=int64)
# tf.Tensor([6 7 8 9 0 1 2 3 4], shape=(9,), dtype=int64)

answered Sep 29 '22 17:09

P-Gn

From what I know, you should instantiate a new dataset iterator to make your change take effect. This will require to tweak a little bit to skip already seen samples.

Here is my simplest solution:

import numpy as np
import tensorflow as tf

def get_dataset(batch_size, num_samples_seen):
    return tf.data.Dataset.range(
        100
    ).skip(
        num_samples_seen
    ).batch(
        batch_size=batch_size
    )

def main():
    batch_size = 1
    num_samples_seen = 0

    train_ds = get_dataset(batch_size, num_samples_seen)

    ds_iterator = iter(train_ds)
    while True:
        try:
            data = next(ds_iterator)
        except StopIteration:
            print("End of iteration")
            break

        print(data)
        batch_size *= 2
        num_samples_seen += data.shape[0]
        ds_iterator = iter(get_dataset(batch_size, num_samples_seen))
        print("New batch size:", batch_size)

if __name__ == "__main__":
    main()

As you can see here, you have to instantiate a new dataset (through a call to get_dataset) and update the iterator.

I don't know of the performance impact of such a solution. Maybe there is another solution requiring to "just" instantiate a batch step instead of the whole dataset.

answered Sep 29 '22 17:09

AlexisBRENON

Related questions
                            
                                Feature-wise scaling and shifting (FiLM layer) in Keras
                            
                                Django: using F() expressions on JSONField?
                            
                                Controlling stack-order of an altair area
                            
                                what's the difference between airflow's 'parallelism' and 'dag_concurrency'
                            
                                Why does this dict of 7 items only consume 368 bytes?
                            
                                Python ThreadPoolExecutor Suppress Exceptions
                            
                                Create a generic List from C# dll in python script
                            
                                In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?
                            
                                Multiple output regression or classifier with one (or more) parameters with Python
                            
                                Rolling sum with strings
                            
                                Indexing numpy array with index array of lower dim yields array of higher dim than both
                            
                                Tensorflow suppresses logging messages bug
                            
                                UnboundLocalError: local variable 'arith_flex' referenced before assignment
                            
                                AWS Lambda Python3.7 Function - numpy: cannot import name 'WinDLL'
                            
                                Custom RMSE not the same as taking the root of built-in Keras MSE for same prediction
                            
                                Displaying of FastAPI validation errors to end users
                            
                                Python dictionary with multiple keys pointing to same list in memory efficient way
                            
                                Artifact storage and MLFLow on remote server
                            
                                Get the most efficient combination of a large List of objects based on a field
                            
                                How to Reference a Pandas Column that has a dot in the name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to change batch size dynamically in Tensorflow 2.0 Dataset?

Tags:

python

tensorflow

tensorflow2.0

tensorflow-datasets

Himaprasoon

People also ask

2 Answers

P-Gn

AlexisBRENON

Recent Activity

Donate For Us