I want reproducible results for the CNNs I train. Hence I set the seed in my script:
import tensorflow as tf
tf.set_random_seed(0) # make sure results are reproducible
import numpy as np
np.random.seed(0) # make sure results are reproducible
The docs of set_random_seed
and np.random.seed
do not report any special behaviour for a seed of 0
.
When I run the same script twice on the same machine within a couple of minutes and without making updates, I expected to get the same results. However, this is not the case:
Run 1:
0;0.001733;0.001313
500;0.390164;0.388188
Run 2:
0;0.006986;0.007000
500;0.375288;0.374250
How can I make the network produce reproducible results?
$ python -c "import tensorflow;print(tensorflow.__version__)"
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.0
$ python -c "import numpy;print(numpy.__version__)"
1.12.0
While I didn't solve the problem, here are possible reasons why the results are not always the same (roughly ordered from most likely/easiest to fix to most unlikely/hardest to fix). I also try to give a solution after the problem.
2017-12-31-23-54-experiment-result.log
for every single experiment you run. Not manually,
but the experiment creates it. Yes, the time stamp in the name for easier finding it again. All following should be logged to that file for each single experiment.In any case, running the "same" thing multiple times might help to get a gut feeling for how different things are.
If you write a paper, I think the following would be the best practice for reproducibility:
requirements.txt
you have to give the exact software version, not something like tensorflow>=1.0.0
but tensorflow==1.2.3
For logging the versions, you might want to use something like this:
#!/usr/bin/env python
# core modules
import subprocess
def get_logstring():
"""
Get important environment information that might influence experiments.
Returns
-------
logstring : str
"""
logstring = []
with open('/proc/cpuinfo') as f:
cpuinfo = f.readlines()
for line in cpuinfo:
if "model name" in line:
logstring.append("CPU: {}".format(line.strip()))
break
with open('/proc/driver/nvidia/version') as f:
version = f.read().strip()
logstring.append("GPU driver: {}".format(version))
logstring.append("VGA: {}".format(find_vga()))
return "\n".join(logstring)
def find_vga():
vga = subprocess.check_output("lspci | grep -i 'vga\|3d\|2d'",
shell=True,
executable='/bin/bash')
return vga
print(get_logstring())
which gives something like
CPU: model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
GPU driver: NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
VGA: 00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06)
02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
Might be a scope problem. Make sure to set the seed within your scope in which you're using the graph, e.g. after
with tf.Graph().as_default()
tf.set_random_seed(0)
This also has to be done after calling tf.reset_default_graph()
.
For a full example, see How to get stable results with TensorFlow, setting random seed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With