Is tf.contrib.layers.fully_connected() behavior change between tensorflow 1.3 and 1.4 an issue?

Tags:

I was recently completing a CNN implementation using TensorFlow from an online course which I would prefer not to mention to avoid breaking the platform rules. I ran into surprising results where my local implementation diverged significantly from the one on the platform server. After further investigation, I nailed down the problem to a change in tf.contrib.layers.fully_connected() behaviour between versions 1.3 and 1.4 of TensorFlow.

I prepared a small subset of the source code to reproduce the issue:

import numpy as np
import tensorflow as tf

np.random.seed(1)

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    X = tf.placeholder(tf.float32, [None, n_H0, n_W0, n_C0])
    Y = tf.placeholder(tf.float32, [None, n_y])
    return X, Y

def initialize_parameters():
    tf.set_random_seed(1)
    W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    parameters = {"W1": W1, "W2": W2}
    return parameters

def forward_propagation(X, parameters):
    W1 = parameters['W1']
    W2 = parameters['W2']
    Z1 = tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME')
    A1 = tf.nn.relu(Z1)
    P1 = tf.nn.max_pool(A1, ksize=[1, 8, 8, 1], strides=[1, 8, 8, 1], padding='SAME')
    Z2 = tf.nn.conv2d(P1, W2, strides=[1, 1, 1, 1], padding='SAME')
    A2 = tf.nn.relu(Z2)
    P2 = tf.nn.max_pool(A2, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')
    F2 = tf.contrib.layers.flatten(P2)
    Z3 = tf.contrib.layers.fully_connected(F2, 6, activation_fn=None)
    return Z3

tf.reset_default_graph()
with tf.Session() as sess:
    np.random.seed(1)
    X, Y = create_placeholders(64, 64, 3, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    init = tf.global_variables_initializer()
    sess.run(init)
    a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,6)})
    print("Z3 = " + str(a))

When running tensorflow 1.3- (tested 1.2.1 as well), the output for Z3 is:

Z3 = [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376  0.46852064]
 [-0.17601591 -1.57972014 -1.4737016  -2.61672091 -1.00810647  0.5747785 ]]

When running tensorflow 1.4+ (tested up to 1.7), the output for Z3 is:

Z3 = [[ 1.44169843 -0.24909666  5.45049906 -0.26189619 -0.20669907  1.36546707]
 [ 1.40708458 -0.02573211  5.08928013 -0.48669922 -0.40940708  1.26248586]]

A detailed review of all the tensors in forward_propagation() (i.e.Wx, Ax, Px, etc.) points to tf.contrib.layers.fully_connected() since Z3 is the only diverging tensor.

The function signature did not change so I have no idea what happens under the hood.

I get a warning with 1.3 and before which disappears with 1.4 and beyond:

2018-04-09 23:13:39.954455: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954495: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954508: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954521: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

I was wondering if maybe something changed in the default initialization of the parameters? Anyway, this is where I am right now. I can go ahead with the course but I feel a bit frustrated that I can't get a final call on this issue. I am wondering if this is a known behaviour or if a bug was introduced somewhere.

Besides, when completing the assignment, the final model is expected to deliver a test accuracy of 0.78 on an image recognition task after 100 epochs. This is precisely what happens with 1.3- but the accuracy drops to 0.58 with 1.4+, everything otherwise equal. This is a huge difference. I guess that a longer training might erase the difference but still, this is not a slight one so it might be worth mentioning.

Any comment / suggestion welcome.

Thanks,

Laurent

633

asked Apr 09 '18 21:04

LCasta

1 Answers

So here's the breakdown. The problem, somewhat surprisingly, is caused by tf.contrib.layers.flatten() because it changes the random seed differently in the different versions. There are two ways to seed the random number generator in Tensorflow, either you seed it for the whole graph with tf.set_random_seed() or you can specify a seed argument where it makes sense. As per the docs on tf.set_random_seed(), note point 2:

Operations that rely on a random seed actually derive it from two seeds: the graph-level and operation-level seeds. This sets the graph-level seed.

Its interactions with operation-level seeds is as follows:

If neither the graph-level nor the operation seed is set: A random seed is used for this op.

If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.

If the graph-level seed is not set, but the operation seed is set: A default graph-level seed and the specified operation seed are used to determine the random sequence.

If both the graph-level and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.

In our case the seed is set at the graph level, and Tensorflow does some deterministic calculation to calculate the actual seed to use in the operation. This calculation apparently depends on the number of operations as well.

In addition, the implementation of tf.contrib.layers.flatten() has changed exactly between the versions 1.3 and 1.4. You can look it up in the repository, but basically the code was simplified and moved from tensorflow/contrib/layers/python/layers/layers.py into tensorflow/tensorflow/python/layers/core.py, but for us the important part is that it changed the number of operations performed, thereby changing the random seed applied in the Xavier initializer on your fully connected layer.

A possible workaround would be specifying the seed for each weight tensor separately, but that would require either manually generating the fully connected layer or touching the Tensorflow code. If you were only interested to know this info to be sure there's no issue with your code, then rest assured.

Minimal example to reproduce behavior, note the commented out line starting with Xf:

import numpy as np
import tensorflow as tf

tf.reset_default_graph()
tf.set_random_seed(1)
with tf.Session() as sess:
    X = tf.constant( [ [ 1, 2, 3, 4, 5, 6 ] ], tf.float32 )
    #Xf = tf.contrib.layers.flatten( X )
    R = tf.random_uniform( shape = () )
    R_V = sess.run( R )
print( R_V )

If you run this code as above, you get a printout of:

0.38538742

for both versions. If you uncomment the Xf line, you get

0.013653636

and

0.6033112

for versions 1.3 and 1.4 respectively. Interesting to note that Xf is never even executed, simply creating it is enough to cause the issue.

Two final notes: the four warnings you get with 1.3 are not related to this, those are only compilation options that could optimize (speed up) some calculations.

The other thing is that this should not affect the training behavior of your code, this issue changes the random seed only. So there must be some other difference causing the slower learning you observe.

149

answered Dec 03 '22 05:12

Peter Szoldan

Related questions
                            
                                Wrapping text not working in matplotlib
                            
                                force Django tests to write models into database
                            
                                How to on Import PEP8 the Package
                            
                                discord.py embed with locally saved images
                            
                                How to install a wheel-style package using setup.py
                            
                                Keras : Why does Sequential and Model give different outputs?
                            
                                Odd TypeError from the airflow scheduler -- has usage of @once for scheduler interval changed in v1.9?
                            
                                How do I copy the contents of a word document?
                            
                                How to get stdout and stderr from a tmux session?
                            
                                Sort python dictionary keys based on sub-dictionary keys by defining sorting order
                            
                                Converting Tensor to np.array using K.eval() in Keras returns InvalidArgumentError
                            
                                Time complexity of min, max on sets
                            
                                Q Learning Applied To a Two Player Game
                            
                                Keras ConvLSTM2D: ValueError on output layer
                            
                                ModuleNotFoundError issue for pytest
                            
                                Cryptacular is broken
                            
                                matplotlib 1.3.1 has requirement numpy>=1.5, but you'll have numpy 1.8.0rc1 which is incompatible
                            
                                Python: Remove duplicates for a specific item from list
                            
                                Why can a subprocess still write to stdout after it's been closed?
                            
                                python requests.get gets stuck

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is tf.contrib.layers.fully_connected() behavior change between tensorflow 1.3 and 1.4 an issue?

Tags:

python

tensorflow

LCasta

People also ask

1 Answers

Peter Szoldan

Recent Activity

Donate For Us