Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Get Reproducible Results (Keras, Tensorflow):

To make the results reproducible I've red more than 20 articles and added to my script maximum of the functions ... but failed.

In the official source I red there are 2 kinds of seeds - global and operational. May be, the key to solving my problem is setting the operational seed, but I don't understand where to apply it.

Would you, please, help me to achieve reproducible results with tensorflow (version > 2.0)? Thank you very much.

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from keras.optimizers import adam
from sklearn.preprocessing import MinMaxScaler


np.random.seed(7)
import tensorflow as tf
tf.random.set_seed(7) #analogue of set_random_seed(seed_value)
import random
random.seed(7)
tf.random.uniform([1], seed=1)
tf.Graph.as_default #analogue of  tf.get_default_graph().finalize()

rng = tf.random.experimental.Generator.from_seed(1234)
rng.uniform((), 5, 10, tf.int64)  # draw a random scalar (0-D tensor) between 5 and 10

df = pd.read_csv("s54.csv", 
                 delimiter = ';', 
                 decimal=',', 
                 dtype = object).apply(pd.to_numeric).fillna(0)

#data normalization
scaler = MinMaxScaler() 
scaled_values = scaler.fit_transform(df) 
df.loc[:,:] = scaled_values


X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,1:],
                                                    df.iloc[:,:1],
                                                    test_size=0.2,
                                                    random_state=7,
                                                    stratify = df.iloc[:,:1])

model = Sequential()
model.add(Dense(1200, input_dim=len(X_train.columns), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=adam(lr=0.01)
metrics=['accuracy']
epochs = 2
batch_size = 32
verbose = 0

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
model.fit(X_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(X_test)
tn, fp, fn, tp = confusion_matrix(y_test, predictions>.5).ravel()
like image 238
Alex Ivanov Avatar asked Apr 07 '20 11:04

Alex Ivanov


2 Answers

As a reference from the documentation
Operations that rely on a random seed actually derive it from two seeds: the global and operation-level seeds. This sets the global seed.

Its interactions with operation-level seeds are as follows:

  1. If neither the global seed nor the operation seed is set: A randomly picked seed is used for this op.
  2. If the operation seed is not set but the global seed is set: The system picks an operation seed from a stream of seeds determined by the global seed.
  3. If the operation seed is set, but the global seed is not set: A default global seed and the specified operation seed are used to determine the random sequence.
  4. If both the global and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.

1st Scenario

A random seed will be picked by default. This can be easily noticed with the results. It will have different values every time you re-run the program or call the code multiple times.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(x_train)

2nd Scenario

The global is set but the operation has not been set. Although it generated a different seed from first and second random. If you re-run or restart the code. The seed for both will still be the same. It both generated the same result over and over again.

tf.random.set_seed(2)
first = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(first)
sec = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(sec)

3rd Scenario

For this scenario, where the operation seed is set but not the global. If you re-run the code it will give you different results but if you restart the runtime if will give you the same sequence of results from the previous run.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=2)
print(x_train)

4th scenario

Both seeds will be used to determine the random sequence. Changing the global and operation seed will give different results but restarting the runtime with the same seed will still give the same results.

tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=1)
print(x_train) 

Created a reproducible code as a reference.
By setting the global seed, It always gives the same results.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

## GLOBAL SEED ##                                                   
tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
y_train = tf.math.sin(x_train)
x_test = tf.random.normal((10,1), 2, 3, dtype=tf.float32)
y_test = tf.math.sin(x_test)

model = Sequential()
model.add(Dense(1200, input_shape=(1,), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=tf.keras.optimizers.Adam(lr=0.01)
metrics=['mse']
epochs = 5
batch_size = 32
verbose = 1

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
histpry = model.fit(x_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(x_test)
print(predictions)

enter image description here
Note: If you are using TensorFlow 2 higher, the Keras is already in the API, therefore, you should use TF.Keras rather than the native one.
All of these are simulated on the google colab.

like image 75
TF_Support Avatar answered Sep 19 '22 10:09

TF_Support


As of TensorFlow 2.8, there is tf.config.experimental.enable_op_determinism().

You can ensure reproducibility, even on GPU, through

import tensorflow as tf

tf.keras.utils.set_random_seed(42)  # sets seeds for base-python, numpy and tf
tf.config.experimental.enable_op_determinism()
like image 23
loki Avatar answered Sep 20 '22 10:09

loki