Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Primer on TensorFlow and Keras: The past (TF1) the present (TF2)

The aim of this question is to ask for a bare-minimal guide to get someone up to speed with TensorFlow 1 and TensorFlow 2. I feel like there isn't a coherent guide that explains differences between TF1 and TF2 and TF has been through major revisions and evolving at a rapid pace.

For reference when I say,

  • v1 or TF1 - I refer to TF 1.15.0
  • v2 or TF2 - I refer to TF 2.0.0

The questions I have are,

  • How does TF1/TF2 work? What are their key differences?

  • What are different datatypes/ data structures in TF1 and TF2?

  • What is Keras and how does that fit in all these? What kind of different APIs Keras provide to implement deep learning models? Can you provide examples of each?

  • What are the most recurring warnings/errors I have to look out for while using TF and Keras?

  • Performance differences between TF1 and TF2

like image 357
thushv89 Avatar asked Nov 30 '19 01:11

thushv89


People also ask

What is the difference between TF1 and TF2?

In terms of the behavior nothing much has changed in data types going from TF1 to TF2. The only main difference is that, the tf. placeholders are gone. You can also have a look at the full list of data types.

Is TF1 faster than TF2?

Below is code benchmarking performance, TF1 vs. TF2 - with TF1 running anywhere from 47% to 276% faster.

What is the difference between TensorFlow 1 and 2?

(As per the TensorFlow team) It is important to understand that there is no battle of TensorFlow 1.0 vs TensorFlow 2.0 as TensorFlow 2.0 is the updated version and hence clearly better and smarter. It was built keeping in mind the drawbacks of TensorFlow 1.0 which was particularly hard to use and understand.


1 Answers

How does TF1/TF2 work? And their differences

TF1

TF1 follows an execution style known as define-then-run. This is opposed to define-by-run which is for example Python execution style. But what does that mean? Define then run means that, just because you called/defined something it's not executed. You have to explicitly execute what you defined.

TF has this concept of a Graph. First you define all the computations you need (e.g. all the layer computations of a neural network, loss computation and an optimizer that minimizes the loss - these are represented as ops or operations). After you define the computation/data-flow graph you execute bits and pieces of this using a Session. Let's see a simple example in action.

# Graph generation
tf_a = tf.placeholder(dtype=tf.float32)
tf_b = tf.placeholder(dtype=tf.float32)
tf_c = tf.add(tf_a, tf.math.multiply(tf_b, 2.0))

# Execution
with tf.Session() as sess:
    c = sess.run(tf_c, feed_dict={tf_a: 5.0, tf_b: 2.0})
    print(c)

The computational graph (also known as data flow graph) will look like below.

     tf_a      tf_b   tf.constant(2.0)
       \         \   /
        \      tf.math.multiply
         \     /
         tf.add
            |
          tf_c

Analogy: Think about you making a cake. You download the recipe from the internet. Then you start following the steps to actually make the cake. The recipe is the Graph and the process of making the cake is what the Session does (i.e. execution of the graph).

TF2

TF2 follows immediate execution style or define-by-run. You call/define something, it is executed. Let's see an example.

a = tf.constant(5.0)
b = tf.constant(3.0)
c = tf_a + (tf_b * 2.0)
print(c.numpy())

Woah! It looks so clean compared to the TF1 example. Everything looks so Pythonic.

Analogy: Now think that you are in a hands-on cake workshop. You are making cake as the instructor explains. And the instructor explains what the result of each step is immediately. So, unlike in the previous example you don't have to wait till you bake the cake to see if you got it right (which is a reference to the fact that you cannot debug code). But you get instant feedback on how you are doing (you know what this means).

Does that mean TF2 doesn't build a graph? Panic attack

Well yes and no. There's two features in TF2 you should know about eager execution and AutoGraph functions.

Tip: To be exact TF1 also had eager execution (off by default) and can be enabled using tf.enable_eager_execution(). TF2 has eager_execution on by default.

Eager execution

Eager execution can immediately execute Tensors and Operations. This is what you observed in the TF2 example. But the flipside is that it does not build a graph. So for example you use eager execution to implement and run a neural network, it will be very slow (as neural networks do very repetitive tasks (forward computation - loss computation - backward pass) over and over again).

AutoGraph

This is where the AutoGraph feature comes to the rescue. AutoGraph is one of my favorite features in TF2. What this does is that if you are doing "TensorFlow" stuff in a function, it analyses the function and builds the graph for you (mind blown). So for example you do the following. TensorFlow builds the graph.

@tf.function
def do_silly_computation(x, y):
    a = tf.constant(x)
    b = tf.constant(y)
    c = tf_a + (tf_b * 2.0)
    return c

print(do_silly_computation(5.0, 3.0).numpy())

So all you need to do is define a function which takes the necessary inputs and return the correct output. Most importantly add @tf.function decorator as that's the trigger for TensorFlow AutoGraph to analyse a given function.

Warning: AutoGraph is not a silver bullet and not to be used naively. There are various limitations of AutoGraph too.

Differences between TF1 and TF2

  • TF1 requires a tf.Session() object to execute the graph and TF2 doesn't
  • In TF1 the unreferenced variables were not collected by the Python GC, but in TF2 they are
  • TF1 does not promote code modularity as you need the full graph defined before starting the computations. However, with the AutoGraph function code modularity is encouraged

What are different datatypes in TF1 and TF2?

You've already seen lot of the main data types. But you might have questions about what they do and how they behave. Well this section is all about that.

TF1 Data types / Data structures

  • tf.placeholder: This is how you provide inputs to the computational graph. As the name suggest, it does not have a value attached to it. Rather, you feed a value at runtime. tf_a and tf_b are examples of these. Think of this as an empty box. You fill that with water/sand/fluffy teddy bears depending on the need.

  • tf.Variable: This is what you use to define the parameters of your neural network. Unlike placeholders, variables are initialized with some value. But their value can also be changed over time. This is what happens to the parameters of a neural network during back propagation.

  • tf.Operation: Operations are various transformations you can execute on Placeholders, Tensors and Variables. For example tf.add() and tf.mul() are operations. These operations return a Tensor (most of the time). If you want proof of an op that doesn't return a Tensor, check this out.

  • tf.Tensor: This is similar to a variable in the sense that it has an initial value. However, once they are defined, their value cannot be changed (i.e. they are immutable). For example, tf_c in the previous example is a tf.Tensor.

TF2 Data types / Data structures

  • tf.Variable
  • tf.Tensor
  • tf.Operation

In terms of the behavior nothing much has changed in data types going from TF1 to TF2. The only main difference is that, the tf.placeholders are gone. You can also have a look at the full list of data types.

What is Keras and how does that fit in all these?

Keras used to be a separate library providing high-level implementations of components (e.g. layers and models) that are mainly used for deep learning models. But since later versions of TensorFlow, Keras got integrated into TensorFlow.

So as I explained, Keras hides lot of unnecessary intricacies you have to deal with if you were to work with bare-bone TensorFlow. There are two main things Keras offers Layer objects and Model objects for implementing NNs. Keras also has two most common model APIs that lets you develop models: the Sequential API and the Functional API. Let's see how different Keras and TensorFlow are in a quick example. Let's build a simple CNN.

Tip: Keras allows you to achieve what you can with do achieve with TF much easier. But Keras also provide capabilities that are not yet strong in TF (e.g. text processing capabilities).

height=64
width = 64
n_channels = 3
n_outputs = 10

Keras (Sequential API) example

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(2,2), 
activation='relu',input_shape=(height, width, n_channels)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()

Pros

Straight-forward to implement simple models

Cons

Cannot be used to implement complex models (e.g. models with multiple inputs)

Keras (Functional API) example

inp = Input(shape=(height, width, n_channels))
out = Conv2D(filters=32, kernel_size=(2,2), activation='relu',input_shape=(height, width, n_channels))(inp)
out = MaxPooling2D(pool_size=(2,2))(out)
out = Conv2D(filters=64, kernel_size=(2,2), activation='relu')(out)
out = MaxPooling2D(pool_size=(2,2))(out)
out = Flatten()(out)
out = Dense(n_outputs, activation='softmax')(out)
model = Model(inputs=inp, outputs=out)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()

Pros

Can be used to implement complex models involving multiple inputs and outputs

Cons

Needs to have a very good understanding of the shapes of the inputs outputs and what's expected as an input by each layer

TF1 example

# Input
tf_in = tf.placeholder(shape=[None, height, width, n_channels], dtype=tf.float32)

# 1st conv and max pool
conv1 = tf.Variable(tf.initializers.glorot_uniform()([2,2,3,32]))
tf_out = tf.nn.conv2d(tf_in, filters=conv1, strides=[1,1,1,1], padding='SAME') # 64,64
tf_out = tf.nn.max_pool2d(tf_out, ksize=[2,2], strides=[1,2,2,1], padding='SAME') # 32,32

# 2nd conv and max pool
conv2 = tf.Variable(tf.initializers.glorot_uniform()([2,2,32,64]))
tf_out = tf.nn.conv2d(tf_out, filters=conv2, strides=[1,1,1,1], padding='SAME') # 32, 32
tf_out = tf.nn.max_pool2d(tf_out, ksize=[2,2], strides=[1,2,2,1], padding='SAME') # 16, 16
tf_out = tf.reshape(tf_out, [-1, 16*16*64])

# Dense layer
dense = conv1 = tf.Variable(tf.initializers.glorot_uniform()([16*16*64, n_outputs]))
tf_out = tf.matmul(tf_out, dense)

Pros

Is very good for cutting edge research involving atypical operations (e.g. changing the sizes of layers dynamically)

Cons

Poor readability

Caveats and Gotchas

Here I will be listing down few things you have to watch out for when using TF (coming from my experience).

TF1 - Forgetting to feed all the dependent placeholders to compute the result

tf_a = tf.placeholder(dtype=tf.float32)
tf_b = tf.placeholder(dtype=tf.float32)
tf_c = tf.add(tf_a, tf.math.multiply(tf_b, 2.0))

with tf.Session() as sess:
    c = sess.run(tf_c, feed_dict={tf_a: 5.0})
    print(c)

InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_8' with dtype float [[node Placeholder_8 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

The reason you get an error here is because, you haven't fed a value to tf_b. So make sure you feed values to all the dependent placeholder to compute a result.

TF1 - Be very very careful of data types

tf_a = tf.placeholder(dtype=tf.int32)
tf_b = tf.placeholder(dtype=tf.float32)
tf_c = tf.add(tf_a, tf.math.multiply(tf_b, 2.0))

with tf.Session() as sess:
    c = sess.run(tf_c, feed_dict={tf_a: 5, tf_b: 2.0})
    print(c)

TypeError: Input 'y' of 'Add' Op has type float32 that does not match type int32 of argument 'x'.

Can you spot the error? It is because you have to match data types when passing them to operations. Otherwise, use tf.cast() operation to cast your data type to a compatible data type.

Keras - Understand what input shape each layer expects

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(2,2), 
activation='relu',input_shape=(height, width)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(2,2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam')

ValueError: Input 0 of layer conv2d_8 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, 64, 64]

Here, you have defined an input shape [None, height, width] (when you add the batch dimension). But Conv2D expects a 4D input [None, height, width, n_channels]. Therefore you get the error above. Some commonly misunderstood/error-prone layers are,

  • Conv2D layer - Expects a 4D input [None, height, width, n_channels]. To know about the convolution layer/operation in more detail have a look at this answer
  • LSTM layer - Expects a 3D input [None, timesteps, n_dim]
  • ConvLSTM2D layer - Expects a 5D input [None, timesteps, height, width, n_channels]
  • Concatenate layer - Except the axis the data concatenated on all other dimension needs to be the same

Keras - Feeding in the wrong input/output shape during fit()

height=64
width = 64
n_channels = 3
n_outputs = 10

Xtrain = np.random.normal(size=(500, height, width, 1))
Ytrain = np.random.choice([0,1], size=(500, n_outputs))

# Build the model

# fit network
model.fit(Xtrain, Ytrain, epochs=10, batch_size=32, verbose=0)

ValueError: Error when checking input: expected conv2d_9_input to have shape (64, 64, 3) but got array with shape (64, 64, 1)

You should know this one. We are feeding an input of shape [batch size, height, width, 1] when we should be feeding a [batch size, height, width, 3] input.

Performance differences between TF1 and TF2

This has already been in discussion here. So I will not reiterate what's in there.

Things I wish I could have talked about but couldn't

I'm leaving this with some links to further reading.

  • tf.data.Dataset
  • tf.RaggedTensor
like image 130
thushv89 Avatar answered Oct 12 '22 21:10

thushv89