I'm learning DRL with the book Deep Reinforcement Learning in Action. In chapter 3, they present the simple game Gridworld (instructions here, in the rules section) with the corresponding code in PyTorch. I've experimented with the code and it takes less than 3 minutes to train the network with 89% of wins (won 89 of 100 games after training). <img src="https://i.stack.imgur.com/s5lAj.png" alt="Training loss with pytorch"> As an exercise, I have migrated the code to tensorflow. All the code is here. The problem is that with my tensorflow port it takes near 2 hours to train the network with a win rate of 84%. Both versions are using the only CPU to train (I don't have GPU) <img src="https://i.stack.imgur.com/EtGIZ.png" alt="Training loss with tensorflow"> Training loss figures seem correct and also the rate of a win (we have to take into consideration that the game is random and can have impossible states). The problem is the performance of the overall process. I'm doing something terribly wrong, but what? The main differences are in the training loop, in torch is this: <pre class="prettyprint"><code> loss_fn = torch.nn.MSELoss() learning_rate = 1e-3 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) .... Q1 = model(state1_batch) with torch.no_grad(): Q2 = model2(state2_batch) #B Y = reward_batch + gamma * ((1-done_batch) * torch.max(Q2,dim=1)[0]) X = Q1.gather(dim=1,index=action_batch.long().unsqueeze(dim=1)).squeeze() loss = loss_fn(X, Y.detach()) optimizer.zero_grad() loss.backward() optimizer.step() </code></pre> and in the tensorflow version: <pre class="prettyprint"><code> loss_fn = tf.keras.losses.MSE learning_rate = 1e-3 optimizer = tf.keras.optimizers.Adam(learning_rate) ... Q2 = model2(state2_batch) #B with tf.GradientTape() as tape: Q1 = model(state1_batch) Y = reward_batch + gamma * ((1-done_batch) * tf.math.reduce_max(Q2, axis=1)) X = [Q1[i][action_batch[i]] for i in range(len(action_batch))] loss = loss_fn(X, Y) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) </code></pre> Why is the training taking so long?

<h3>Why is TensorFlow slow</h3> <code>TensorFlow</code> has 2 execution modes: eager execution, and graph mode. <code>TensorFlow</code> default behavior, since version 2, is to default to eager execution. Eager execution is great as it enables you to write code close to how you would write standard python. It's easier to write, and it's easier to debug. Unfortunately, it's really not as fast as graph mode. So the idea is, once the function is prototyped in eager mode, to make TensorFlow execute it in graph mode. For that you can use <code>tf.function</code>. <code>tf.function</code> compiles a callable into a TensorFlow graph. Once the function is compiled into a graph, the performance gain is usually quite important. The recommended approach when developing in <code>TensorFlow</code> is the following: <blockquote> <ul> <li>Debug in eager mode, then decorate with <code>@tf.function</code>.</li> <li>Don't rely on Python side effects like object mutation or list appends.</li> <li> <code>tf.function</code> works best with TensorFlow ops; NumPy and Python calls are converted to constants.</li> </ul> </blockquote> I would add: think about the critical parts of your program, and which ones should be converted first into graph mode. It's usually the parts where you call a model to get a result. It's where you will see the best improvements. You can find more information in the following guides: <ul> <li>Better performance with tf.function</li> <li>Introduction to graphs and tf.function</li> </ul> <h3>Applying <code>tf.function</code> to your code</h3> So, there are at least two things you can change in your code to make it run quite faster: <ol> <li>The first one is to not use <code>model.predict</code> on a small amount of data. The function is made to work on a huge dataset or on a generator. (See this comment on Github). Instead, you should call the model directly, and for performance enhancement, you can wrap the call to the model in a <code>tf.function</code>.</li> </ol> <blockquote> Model.predict is a top-level API designed for batch-predicting outside of any loops, with the fully-features of the Keras APIs. </blockquote> <ol start="2"> <li>The second one is to make your training step a separate function, and to decorate that function with <code>@tf.function</code>.</li> </ol> So, I would declare the following things before your training loop: <pre class="prettyprint"><code># to call instead of model.predict model_func = tf.function(model) def get_train_func(model, model2, loss_fn, optimizer): """Wrapper that creates a train step using the two model passed""" @tf.function def train_func(state1_batch, state2_batch, done_batch, reward_batch, action_batch): Q2 = model2(state2_batch) #B with tf.GradientTape() as tape: Q1 = model(state1_batch) Y = reward_batch + gamma * ((1-done_batch) * tf.math.reduce_max(Q2, axis=1)) # gather is more efficient than a list comprehension, and needed in a tf.function X = tf.gather(Q1, action_batch, batch_dims=1) loss = loss_fn(X, Y) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) return loss return train_func # train step is a callable train_step = get_train_func(model, model2, loss_fn, optimizer) </code></pre> And you can use that function in your training loop: <pre class="prettyprint"><code>if len(replay) > batch_size: minibatch = random.sample(replay, batch_size) state1_batch = np.array([s1 for (s1,a,r,s2,d) in minibatch]).reshape((batch_size, 64)) action_batch = np.array([a for (s1,a,r,s2,d) in minibatch]) #TODO: Posibles diferencies reward_batch = np.float32([r for (s1,a,r,s2,d) in minibatch]) state2_batch = np.array([s2 for (s1,a,r,s2,d) in minibatch]).reshape((batch_size, 64)) done_batch = np.array([d for (s1,a,r,s2,d) in minibatch]).astype(np.float32) loss = train_step(state1_batch, state2_batch, done_batch, reward_batch, action_batch) losses.append(loss) </code></pre> There are other changes that you could make to make your code more TensorFlowesque, but with those modifications, your code takes ~2 minutes on my CPU. (with a 97% win rate).

Why is this tensorflow training taking so long?

Tags:

performance

python

tensorflow

deep-learning

pytorch

I'm learning DRL with the book Deep Reinforcement Learning in Action. In chapter 3, they present the simple game Gridworld (instructions here, in the rules section) with the corresponding code in PyTorch.

I've experimented with the code and it takes less than 3 minutes to train the network with 89% of wins (won 89 of 100 games after training).

Training loss with pytorch

As an exercise, I have migrated the code to tensorflow. All the code is here.

The problem is that with my tensorflow port it takes near 2 hours to train the network with a win rate of 84%. Both versions are using the only CPU to train (I don't have GPU)

Training loss with tensorflow

Training loss figures seem correct and also the rate of a win (we have to take into consideration that the game is random and can have impossible states). The problem is the performance of the overall process.

I'm doing something terribly wrong, but what?

The main differences are in the training loop, in torch is this:

        loss_fn = torch.nn.MSELoss()
        learning_rate = 1e-3
        optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
        ....
        Q1 = model(state1_batch) 
        with torch.no_grad():
            Q2 = model2(state2_batch) #B
        
        Y = reward_batch + gamma * ((1-done_batch) * torch.max(Q2,dim=1)[0])
        X = Q1.gather(dim=1,index=action_batch.long().unsqueeze(dim=1)).squeeze()
        loss = loss_fn(X, Y.detach())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

and in the tensorflow version:

        loss_fn = tf.keras.losses.MSE
        learning_rate = 1e-3
        optimizer = tf.keras.optimizers.Adam(learning_rate)
        ...
        Q2 = model2(state2_batch) #B
        with tf.GradientTape() as tape:
            Q1 = model(state1_batch)
            Y = reward_batch + gamma * ((1-done_batch) * tf.math.reduce_max(Q2, axis=1))
            X = [Q1[i][action_batch[i]] for i in range(len(action_batch))]
            loss = loss_fn(X, Y)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

Why is the training taking so long?

767

asked May 04 '21 10:05

Ivan

Video Answer

1 Answers

Why is TensorFlow slow

TensorFlow has 2 execution modes: eager execution, and graph mode. TensorFlow default behavior, since version 2, is to default to eager execution. Eager execution is great as it enables you to write code close to how you would write standard python. It's easier to write, and it's easier to debug. Unfortunately, it's really not as fast as graph mode.

So the idea is, once the function is prototyped in eager mode, to make TensorFlow execute it in graph mode. For that you can use tf.function. tf.function compiles a callable into a TensorFlow graph. Once the function is compiled into a graph, the performance gain is usually quite important. The recommended approach when developing in TensorFlow is the following:

Debug in eager mode, then decorate with @tf.function.

Don't rely on Python side effects like object mutation or list appends.

tf.function works best with TensorFlow ops; NumPy and Python calls are converted to constants.

I would add: think about the critical parts of your program, and which ones should be converted first into graph mode. It's usually the parts where you call a model to get a result. It's where you will see the best improvements.

You can find more information in the following guides:

Better performance with tf.function
Introduction to graphs and tf.function

Applying `tf.function` to your code

So, there are at least two things you can change in your code to make it run quite faster:

The first one is to not use model.predict on a small amount of data. The function is made to work on a huge dataset or on a generator. (See this comment on Github). Instead, you should call the model directly, and for performance enhancement, you can wrap the call to the model in a tf.function.

Model.predict is a top-level API designed for batch-predicting outside of any loops, with the fully-features of the Keras APIs.

The second one is to make your training step a separate function, and to decorate that function with @tf.function.

So, I would declare the following things before your training loop:

# to call instead of model.predict
model_func = tf.function(model)

def get_train_func(model, model2, loss_fn, optimizer):
    """Wrapper that creates a train step using the two model passed"""
    @tf.function
    def train_func(state1_batch, state2_batch, done_batch, reward_batch, action_batch):
        Q2 = model2(state2_batch) #B
        with tf.GradientTape() as tape:
            Q1 = model(state1_batch)
            Y = reward_batch + gamma * ((1-done_batch) * tf.math.reduce_max(Q2, axis=1))
            # gather is more efficient than a list comprehension, and needed in a tf.function
            X = tf.gather(Q1, action_batch, batch_dims=1)
            loss = loss_fn(X, Y)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        return loss
    return train_func

# train step is a callable 
train_step = get_train_func(model, model2, loss_fn, optimizer)

And you can use that function in your training loop:

if len(replay) > batch_size:
    minibatch = random.sample(replay, batch_size)
    state1_batch = np.array([s1 for (s1,a,r,s2,d) in minibatch]).reshape((batch_size, 64))
    action_batch = np.array([a for (s1,a,r,s2,d) in minibatch])   #TODO: Posibles diferencies
    reward_batch = np.float32([r for (s1,a,r,s2,d) in minibatch])
    state2_batch = np.array([s2 for (s1,a,r,s2,d) in minibatch]).reshape((batch_size, 64))
    done_batch = np.array([d for (s1,a,r,s2,d) in minibatch]).astype(np.float32)

    loss = train_step(state1_batch, state2_batch, done_batch, reward_batch, action_batch)
    losses.append(loss)

There are other changes that you could make to make your code more TensorFlowesque, but with those modifications, your code takes ~2 minutes on my CPU. (with a 97% win rate).

171

answered Oct 19 '22 00:10

Lescurel

Related questions
                            
                                Can't upgrade JupyterLab to latest version
                            
                                How to document a flask-restplus response with list of strings
                            
                                How to access Type Hints inside of a method after it's called in Python
                            
                                How to add additional classes to a pre-trained object detection model and train it to detect all of the classes (pre-trained + new)?
                            
                                passing pointer to C++ from python using pybind11
                            
                                Joining Spark DataFrames on a nearest key condition
                            
                                How to convert image file object to numpy array in with openCv python?
                            
                                Can't parse the username to make sure I'm logged in to a website
                            
                                How to test if PyObject has an iterator
                            
                                How does one have parameters in a pytorch model not be leafs and be in the computation graph?
                            
                                Rearrange data in two-dimensional array according to transformation from polar to Cartesian coordinates
                            
                                Hex size in matplotlib hexbins based on density of nearby points
                            
                                Correct setting of Python home and sys.prefix in an embedded environment
                            
                                How can I set up a Pusher server with Flask?
                            
                                Errorbar in Legend - Pandas Bar Plot
                            
                                How to catch python exception and save traceback text as string
                            
                                Is numpy reusing memory from unused arrays?
                            
                                Speed up the initial TensorFlow startup
                            
                                Jupyter notebook error 'dyld library not loaded: CoreFoundation' after macOS Big Sur update
                            
                                Among the many Python file copy functions, which ones are safe if the copy is interrupted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this tensorflow training taking so long?

Tags:

performance

python

tensorflow

deep-learning

pytorch

Ivan

People also ask

Video Answer

1 Answers

Why is TensorFlow slow

Applying `tf.function` to your code

Lescurel

Recent Activity

Donate For Us

Why is this tensorflow training taking so long?

Tags:

performance

python

tensorflow

deep-learning

pytorch

Ivan

People also ask

Video Answer

1 Answers

Why is TensorFlow slow

Applying tf.function to your code

Lescurel

Related questions

Recent Activity

Donate For Us

Applying `tf.function` to your code