I was recently completing a CNN implementation using TensorFlow from an online course which I would prefer not to mention to avoid breaking the platform rules. I ran into surprising results where my local implementation diverged significantly from the one on the platform server.
After further investigation, I nailed down the problem to a change in tf.contrib.layers.fully_connected()
behaviour between versions 1.3 and 1.4 of TensorFlow.
I prepared a small subset of the source code to reproduce the issue:
import numpy as np
import tensorflow as tf
np.random.seed(1)
def create_placeholders(n_H0, n_W0, n_C0, n_y):
X = tf.placeholder(tf.float32, [None, n_H0, n_W0, n_C0])
Y = tf.placeholder(tf.float32, [None, n_y])
return X, Y
def initialize_parameters():
tf.set_random_seed(1)
W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))
parameters = {"W1": W1, "W2": W2}
return parameters
def forward_propagation(X, parameters):
W1 = parameters['W1']
W2 = parameters['W2']
Z1 = tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME')
A1 = tf.nn.relu(Z1)
P1 = tf.nn.max_pool(A1, ksize=[1, 8, 8, 1], strides=[1, 8, 8, 1], padding='SAME')
Z2 = tf.nn.conv2d(P1, W2, strides=[1, 1, 1, 1], padding='SAME')
A2 = tf.nn.relu(Z2)
P2 = tf.nn.max_pool(A2, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')
F2 = tf.contrib.layers.flatten(P2)
Z3 = tf.contrib.layers.fully_connected(F2, 6, activation_fn=None)
return Z3
tf.reset_default_graph()
with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,6)})
print("Z3 = " + str(a))
When running tensorflow 1.3- (tested 1.2.1 as well), the output for Z3 is:
Z3 = [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376 0.46852064]
[-0.17601591 -1.57972014 -1.4737016 -2.61672091 -1.00810647 0.5747785 ]]
When running tensorflow 1.4+ (tested up to 1.7), the output for Z3 is:
Z3 = [[ 1.44169843 -0.24909666 5.45049906 -0.26189619 -0.20669907 1.36546707]
[ 1.40708458 -0.02573211 5.08928013 -0.48669922 -0.40940708 1.26248586]]
A detailed review of all the tensors in forward_propagation()
(i.e.Wx, Ax, Px, etc.
) points to tf.contrib.layers.fully_connected()
since Z3
is the only diverging tensor.
The function signature did not change so I have no idea what happens under the hood.
I get a warning with 1.3 and before which disappears with 1.4 and beyond:
2018-04-09 23:13:39.954455: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954495: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954508: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-04-09 23:13:39.954521: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I was wondering if maybe something changed in the default initialization of the parameters? Anyway, this is where I am right now. I can go ahead with the course but I feel a bit frustrated that I can't get a final call on this issue. I am wondering if this is a known behaviour or if a bug was introduced somewhere.
Besides, when completing the assignment, the final model is expected to deliver a test accuracy of 0.78
on an image recognition task after 100 epochs. This is precisely what happens with 1.3- but the accuracy drops to 0.58
with 1.4+, everything otherwise equal.
This is a huge difference. I guess that a longer training might erase the difference but still, this is not a slight one so it might be worth mentioning.
Any comment / suggestion welcome.
Thanks,
Laurent
According to an RFC document from August 2018, tf. contrib will be deleted with some of its parts becoming standalone projects (such as tensorflow/probability).
In general, tf. contrib contains contributed code. It is meant to contain features and contributions that eventually should get merged into core TensorFlow, but whose interfaces may still change, or which require some testing to see whether they can find broader acceptance. The code in tf.
In TensorFlow 2.0 we need to use tf.keras.layers.Dense to create a fully connected layer, but more importantly, you have to migrate your codebase to Keras. In fact, you can't define a layer and use it, without creating a tf.keras.Model object that uses it. tf.contrib.layers.fully_connected () is a perfect mess.
In tensorflow, tf.contrib.layers.fully_connected () allows us to create a fully connected layer. In this tutorial, we will discuss some details on it.
· Issue #51515 · tensorflow/tensorflow · GitHub Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails. Already on GitHub?
Please refer the issues #36878, #35197, #30794, #37720 with similar error log.It helps. Sorry, something went wrong. There is no contrib in TF 2.x.
So here's the breakdown. The problem, somewhat surprisingly, is caused by tf.contrib.layers.flatten()
because it changes the random seed differently in the different versions. There are two ways to seed the random number generator in Tensorflow, either you seed it for the whole graph with tf.set_random_seed()
or you can specify a seed
argument where it makes sense. As per the docs on tf.set_random_seed()
, note point 2:
Operations that rely on a random seed actually derive it from two seeds: the graph-level and operation-level seeds. This sets the graph-level seed.
Its interactions with operation-level seeds is as follows:
- If neither the graph-level nor the operation seed is set: A random seed is used for this op.
- If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.
- If the graph-level seed is not set, but the operation seed is set: A default graph-level seed and the specified operation seed are used to determine the random sequence.
- If both the graph-level and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.
In our case the seed is set at the graph level, and Tensorflow does some deterministic calculation to calculate the actual seed to use in the operation. This calculation apparently depends on the number of operations as well.
In addition, the implementation of tf.contrib.layers.flatten()
has changed exactly between the versions 1.3 and 1.4. You can look it up in the repository, but basically the code was simplified and moved from tensorflow/contrib/layers/python/layers/layers.py
into tensorflow/tensorflow/python/layers/core.py
, but for us the important part is that it changed the number of operations performed, thereby changing the random seed applied in the Xavier initializer on your fully connected layer.
A possible workaround would be specifying the seed for each weight tensor separately, but that would require either manually generating the fully connected layer or touching the Tensorflow code. If you were only interested to know this info to be sure there's no issue with your code, then rest assured.
Minimal example to reproduce behavior, note the commented out line starting with Xf:
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
tf.set_random_seed(1)
with tf.Session() as sess:
X = tf.constant( [ [ 1, 2, 3, 4, 5, 6 ] ], tf.float32 )
#Xf = tf.contrib.layers.flatten( X )
R = tf.random_uniform( shape = () )
R_V = sess.run( R )
print( R_V )
If you run this code as above, you get a printout of:
0.38538742
for both versions. If you uncomment the Xf line, you get
0.013653636
and
0.6033112
for versions 1.3 and 1.4 respectively. Interesting to note that Xf is never even executed, simply creating it is enough to cause the issue.
Two final notes: the four warnings you get with 1.3 are not related to this, those are only compilation options that could optimize (speed up) some calculations.
The other thing is that this should not affect the training behavior of your code, this issue changes the random seed only. So there must be some other difference causing the slower learning you observe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With