Nested tf.function is horribly slow

Question

Within a function decorated with tf.function, I try to call another function decorated with tf.function. The result is horribly slow.

Is that because I am not suppose to use python native types in the function? Tensorflow 2.0 model using tf.function very slow and is recompiling every time the train count changes. Eager runs about 4x faster

Test:

import numpy as np
import tensorflow as tf


@tf.function
def loop(x, y):
    for i in range(1000):
        x.assign_add(y)
    return x


@tf.function
def loop2(x, y):
    for i in range(1000):
        loop(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)

    # print(loop2(x, y))  # horribly slow

    for i in range(1000):  # faster
        loop(x, y)


main()

nessuno · Accepted Answer

You should read part 3 of the article cited in the answer you linked.

In part 3, you can see that the problem is not only when using Python native types, but also when using Python constructs (like for) that operate on Python types and not on tf.Tensor objects.

In particular, when looping over a range and not on a tf.range you're building a huge graph since you're repeating 1000 times the body loop (you're unrolling the loop.

If you replace range with tf.range everything goes way faster.

Proof.

Your code (with time measurements and 100 instead of 1000):

import numpy as np
import tensorflow as tf
from time import time

@tf.function
def loop(x, y):
    for i in range(100):
        x.assign_add(y)
    return x


@tf.function
def loop2(x, y):
    for i in range(100):
        loop(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)
    print("one")
    start = time()
    print(loop2(x, y))  # horribly slow
    print("end: ", time() - start)
    print("second: ")
    start = time()
    for i in range(100):  # faster
        loop(x, y)
    print("end: ", time() - start)


main()

The output:

TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end:  86.44128751754761
second: 
end:  0.08476066589355469

Updated code using only TensorFlow methods:

@tf.function
def loop__(x, y):
    for i in tf.range(100):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x


def main():
    print("TensorFlow version: {}".format(tf.__version__))
    print("Eager execution: {}".format(tf.executing_eagerly()))

    x = tf.Variable(initial_value=0, dtype=np.float32)
    y = tf.Variable(initial_value=1, dtype=np.float32)
    print("one")
    start = time()
    print(loop2__(x, y))  # horribly slow
    print("end: ", time() - start)
    print("second: ")
    start = time()
    for i in tf.range(100):  # faster
        loop__(x, y)
    print("end: ", time() - start)


main()

The output:

TensorFlow version: 2.0.0-beta0
Eager execution: True
one
tf.Tensor(10000.0, shape=(), dtype=float32)
end:  0.4946322441101074
second: 
end:  0.24096465110778809

MiloMinderbinder · Answer

A few points to keep in mind about @tf.function (laymanesque):

@tf.function builds a callable graph of the function that it decorates
That graph is referenced to by using a key that is the function signature. This signature is TensorSpec if the input to the function is a Tensor and is a tuple with actual values of the arguments if the input to the function are not tensors
Each time the graph is called the key is checked in all available 'callable graphs' and if a match is found then that 'already built callable graph' is used. If not, the function is converted to callable graph and then it is called. Building the graph is referred to as tracing the function by documentations. Now, you can see why calling the function with python natives creates a new graph each time it is called. That particular combination of inputs is simply not present as the key whereas in case of a tensor the Tesnsorspec key is same for each tensor with same shape and dtype
If a python iterable is used inside the function then while 'tracing the function', the loop will be unrolled to create a gigantic graph. If a tensorflow equivalent like tf.range was used then tensorflow knows how to handle this without unrolling. This unrolling has an overhead the first time the function is run but, unrolled loops are always faster than the loop itself. So, the behavior you will notice is this: With python iterable, as opposed to tensorflow equivalnet (tf.range), The first function run is significantly slow, graph thus created will consume more memory on accelerator but, is significantly faster on all subsequent runs as the graph with python iterable uses unrolled loop.

Demo:

With tf.range

@tf.function
def loop__(x, y):
    for i in tf.range(10000):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x

x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)

start = time()
print(loop2__(x, y))
print("first run with tf.range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with tf.range", time() - start)

output:
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with tf.range 10.322974920272827
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with tf.range 11.379822969436646

with python range:

@tf.function
def loop__(x, y):
    for i in range(10000):
        x.assign_add(y)
    return x


@tf.function
def loop2__(x, y):
    for i in tf.range(100):
        loop__(x, y)
    return x

x = tf.Variable(initial_value=0, dtype=np.float32)
y = tf.Variable(initial_value=1, dtype=np.float32)

start = time()
print(loop2__(x, y))
print("first run with python range", time() - start)
start = time()
print(loop2__(x, y))
print("second run with python range", time() - start)

output (with loads of warnings about inefficient graph unrolling):
tf.Tensor(1000000.0, shape=(), dtype=float32)
first run with python range 51.13001751899719
tf.Tensor(2000000.0, shape=(), dtype=float32)
second run with python range 1.1093688011169434

Nested tf.function is horribly slow

Tags:

python

tensorflow

tensorflow2.0

R zu

2 Answers

nessuno

MiloMinderbinder

Recent Activity

Donate For Us

Nested tf.function is horribly slow

Tags:

python

tensorflow

tensorflow2.0

R zu

2 Answers

nessuno

MiloMinderbinder

Related questions

Recent Activity

Donate For Us