Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does random.seed( ) not work in generating dataset?

I'm creating dataset for testing with

import random
from sklearn.datasets import make_regression

random.seed(10)
X, y = make_regression(n_samples = 1000, n_features = 10)
X[0:2]

Could you please explain why I get a different dataset after each running? For example, 2 runs return

array([[-0.28058959, -0.00570283,  0.31728106,  0.52745066,  1.69651572,
        -0.37038286,  0.67825801, -0.71782482, -0.29886242,  0.07891646],
       [ 0.73872413, -0.27472164, -1.70298606, -0.59211593,  0.04060707,
         1.39661574, -1.25656819, -0.79698442, -0.38533316,  0.65484856]])

and

array([[ 0.12493586,  1.01388974,  1.2390685 , -0.13797227,  0.60029193,
        -1.39268898, -0.49804303,  1.31267837,  0.11774784,  0.56224193],
       [ 0.47067323,  0.3845262 ,  1.22959284, -0.02913909, -1.56481745,
        -1.56479078,  2.04082295, -0.22561445, -0.37150552,  0.91750366]])
like image 902
Akira Avatar asked Jan 25 '23 17:01

Akira


2 Answers

You need to put the seed into the make_regression call as parameter:

sklearn.datasets.make_regression(n_samples=100, n_features=100, n_informative=10,
                                 n_targets=1, bias=0.0, effective_rank=None,
                                 tail_strength=0.5, noise=0.0, shuffle=True,
                                 coef=False, random_state= None )
                                             ^°^°^°^°^°^°^°^°^°

See API:

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

So in your case:

X, y = make_regression(n_samples = 1000, n_features = 10, random_state = 10)
like image 197
Patrick Artner Avatar answered Jan 27 '23 06:01

Patrick Artner


Although setting the random_state argument in make_regression, as already suggested, resolves the issue, it would arguably be useful to clarify exactly the reason why your own code snippet does not work as expected. And the answer is that, as implied in the docs, make_regression uses the random number generator (RNG) from Numpy, and not from the Python random module used in your code.

So, changing only slightly your code snippet to

import numpy as np # change 1
from sklearn.datasets import make_regression

np.random.seed(10) # change 2
X, y = make_regression(n_samples = 1000, n_features = 10) # no random_state set here
X[0:2]

always results in the same dataset:

array([[-1.32553507, -1.34894938, -0.82160306,  0.03538905, -0.68611315,
        -0.74469132,  1.37391771,  0.98675482, -0.90921643, -1.57943748],
       [ 1.13660812,  0.52367005,  0.05090828, -0.47210149, -0.98592548,
        -0.69677968,  0.31752274, -0.0771912 ,  2.17548753,  0.75189637]])

which is actually identical with the one yielded with setting random_state=10 in make_regression:

X, y = make_regression(n_samples = 1000, n_features = 10, random_state=10)
X[0:2]

# result:

array([[-1.32553507, -1.34894938, -0.82160306,  0.03538905, -0.68611315,
        -0.74469132,  1.37391771,  0.98675482, -0.90921643, -1.57943748],
       [ 1.13660812,  0.52367005,  0.05090828, -0.47210149, -0.98592548,
        -0.69677968,  0.31752274, -0.0771912 ,  2.17548753,  0.75189637]])

For more on RNGs, you may find own answer in Are random seeds compatible between systems? useful.

like image 35
desertnaut Avatar answered Jan 27 '23 07:01

desertnaut