Within a function decorated with tf.function
, I try to call another function decorated with tf.function
. The result is horribly slow.
Is that because I am not suppose to use python native types in the function? Tensorflow 2.0 model using tf.function very slow and is recompiling every time the train count changes. Eager runs about 4x faster
Test:
import numpy as np
import tensorflow as tf
@tf.function
def loop(x, y):
for i in range(1000):
x.assign_add(y)
return x
@tf.function
def loop2(x, y):
for i in range(1000):
loop(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
# print(loop2(x, y)) # horribly slow
for i in range(1000): # faster
loop(x, y)
main()
You should read part 3 of the article cited in the answer you linked.
In part 3, you can see that the problem is not only when using Python native types, but also when using Python constructs (like for
) that operate on Python types and not on tf.Tensor
objects.
In particular, when looping over a range
and not on a tf.range
you're building a huge graph since you're repeating 1000
times the body loop (you're unrolling the loop.
If you replace range
with tf.range
everything goes way faster.
Proof.
Your code (with time measurements and 100 instead of 1000):
import numpy as np
import tensorflow as tf
from time import time
@tf.function
def loop(x, y):
for i in range(100):
x.assign_add(y)
return x
@tf.function
def loop2(x, y):
for i in range(100):
loop(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
print("one")
start = time()
print(loop2(x, y)) # horribly slow
print("end: ", time() - start)
print("second: ")
start = time()
for i in range(100): # faster
loop(x, y)
print("end: ", time() - start)
main()
The output:
TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end: 86.44128751754761
second:
end: 0.08476066589355469
Updated code using only TensorFlow methods:
@tf.function
def loop__(x, y):
for i in tf.range(100):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
def main():
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
print("one")
start = time()
print(loop2__(x, y)) # horribly slow
print("end: ", time() - start)
print("second: ")
start = time()
for i in tf.range(100): # faster
loop__(x, y)
print("end: ", time() - start)
main()
The output:
TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end: 0.4946322441101074
second:
end: 0.24096465110778809
A few points to keep in mind about @tf.function (laymanesque):
TensorSpec
if the input to the function is a Tensor and is a tuple with actual values of the arguments if the input to the function are not tensorsTesnsorspec
key is same for each tensor with same shape and dtypetf.range
was used then tensorflow knows how to handle this without unrolling. This unrolling has an overhead the first time the function is run but, unrolled loops are always faster than the loop itself. So, the behavior you will notice is this: With python iterable, as opposed to tensorflow equivalnet (tf.range), The first function run is significantly slow, graph thus created will consume more memory on accelerator but, is significantly faster on all subsequent runs as the graph with python iterable uses unrolled loop.Demo:
With tf.range
@tf.function
def loop__(x, y):
for i in tf.range(10000):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
start = time()
print(loop2__(x, y))
print("first run with tf.range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with tf.range", time() - start)
output:
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with tf.range 10.322974920272827
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with tf.range 11.379822969436646
with python range:
@tf.function
def loop__(x, y):
for i in range(10000):
x.assign_add(y)
return x
@tf.function
def loop2__(x, y):
for i in tf.range(100):
loop__(x, y)
return x
x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)
start = time()
print(loop2__(x, y))
print("first run with python range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with python range", time() - start)
output (with loads of warnings about inefficient graph unrolling):
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 51.13001751899719
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 1.1093688011169434
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With