How to get reproducible results in keras

Tags:

I get different results (test accuracy) every time I run the imdb_lstm.py example from Keras framework (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py) The code contains np.random.seed(1337) in the top, before any keras imports. It should prevent it from generating different numbers for every run. What am I missing?

UPDATE: How to repro:

Install Keras (http://keras.io/)
Execute https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py a few times. It will train the model and output test accuracy.
Expected result: Test accuracy is the same on every run.
Actual result: Test accuracy is different on every run.

UPDATE2: I'm running it on Windows 8.1 with MinGW/msys, module versions:
theano 0.7.0
numpy 1.8.1
scipy 0.14.0c1

UPDATE3: I narrowed the problem down a bit. If I run the example with GPU (set theano flag device=gpu0) then I get different test accuracy every time, but if I run it on CPU then everything works as expected. My graphics card: NVIDIA GeForce GT 635)

263

asked Sep 06 '15 02:09

Pavel Surmenok

2 Answers

You can find the answer at the Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.

In short, to be absolutely sure that you will get reproducible results with your python script on one computer's/laptop's CPU then you will have to do the following:

Set the PYTHONHASHSEED environment variable at a fixed value
Set the python built-in pseudo-random generator at a fixed value
Set the numpy pseudo-random generator at a fixed value
Set the tensorflow pseudo-random generator at a fixed value
Configure a new global tensorflow session

Following the Keras link at the top, the source code I am using is the following:

# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)

# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# for later versions:
# session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
# sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
# tf.compat.v1.keras.backend.set_session(sess)

It is needless to say that you do not have to to specify any seed or random_state at the numpy, scikit-learn or tensorflow/keras functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.

answered Oct 12 '22 21:10

Outcast

Theano's documentation talks about the difficulties of seeding random variables and why they seed each graph instance with its own random number generator.

Sharing a random number generator between different {{{RandomOp}}} instances makes it difficult to producing the same stream regardless of other ops in graph, and to keep {{{RandomOps}}} isolated. Therefore, each {{{RandomOp}}} instance in a graph will have its very own random number generator. That random number generator is an input to the function. In typical usage, we will use the new features of function inputs ({{{value}}}, {{{update}}}) to pass and update the rng for each {{{RandomOp}}}. By passing RNGs as inputs, it is possible to use the normal methods of accessing function inputs to access each {{{RandomOp}}}’s rng. In this approach it there is no pre-existing mechanism to work with the combined random number state of an entire graph. So the proposal is to provide the missing functionality (the last three requirements) via auxiliary functions: {{{seed, getstate, setstate}}}.

They also provide examples on how to seed all the random number generators.

You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.

>>> srng.seed(902340)  # seeds rv_u and rv_n with different seeds each

answered Oct 12 '22 21:10

PabTorre

Related questions
                            
                                How can I call 'git pull' from within Python?
                            
                                How to check whether a str(variable) is empty or not?
                            
                                Count all values in a matrix greater than a value
                            
                                How to make Django serve static files with Gunicorn?
                            
                                Set GOOGLE_APPLICATION_CREDENTIALS in Python project to use Google API
                            
                                Python progress bar and downloads
                            
                                How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?
                            
                                Decorating Hex function to pad zeros
                            
                                How do I output lists as a table in Jupyter notebook?
                            
                                Converting a String to a List of Words?
                            
                                How to check if a network port is open?
                            
                                python list comprehensions; compressing a list of lists?
                            
                                Efficiently applying a function to a grouped pandas DataFrame in parallel
                            
                                How does Lru_cache (from functools) Work?
                            
                                IndexError: too many indices for array
                            
                                Django model manager objects.create where is the documentation?
                            
                                Why does map return a map object instead of a list in Python 3?
                            
                                Why use Django on Google App Engine?
                            
                                How to get stable results with TensorFlow, setting random seed
                            
                                Can I add custom methods/attributes to built-in Python types?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get reproducible results in keras

Tags:

python

numpy

keras

theano

Pavel Surmenok

People also ask

2 Answers

Outcast

PabTorre

Recent Activity

Donate For Us