Differnce between train_test_split and StratifiedShuffleSplit

Question

I came across the following statement when trying to find the differnce between train_test_split and StratifiedShuffleSplit.

When stratify is not None train_test_split uses StratifiedShuffleSplit internally,

I was just wondering why the StratifiedShuffleSplit from sklearn.model_selection is used when we can use the stratify argument available in train_test_split.

s.dallapalma · Accepted Answer

Mainly, it is done for the sake of the re-usability. Rather than duplicating the code already implemented for StratifiedShuffleSplit, train_test_split just calls that class. For the same reason, when stratify=False, it uses the model_selection.ShuffleSplit class (see source code).

Please note that duplicating code is considered a bad practice, because it assumed to inflate maintenance costs, but also considered defect-prone as inconsistent changes to code duplicates can lead to unexpected behavior. Here a reference if you'd like to learn more.

Besides, although they perform the same task, they cannot be always used in the same contexts. For example, train_test_split cannot be used within a Random or Grid search with sklearn.model_selection.RandomizedSearchCV or sklearn.model_selection.GridSearchCV. The StratifiedShuffleSplit does. The reason is that the former is not "an iterable yielding (train, test) splits as arrays of indices". While the latter has a method split that yields (train, test) splits as array of indices. More info here (see parameter cv).

Differnce between train_test_split and StratifiedShuffleSplit

Tags:

machine-learning

scikit-learn

train-test-split

skaarfacee

1 Answers

s.dallapalma

Recent Activity

Donate For Us

Differnce between train_test_split and StratifiedShuffleSplit

Tags:

machine-learning

scikit-learn

train-test-split

skaarfacee

1 Answers

s.dallapalma

Related questions

Recent Activity

Donate For Us