If I want a random train/test split, I use the sklearn helper function:
In [1]: from sklearn.model_selection import train_test_split ...: train_test_split([1,2,3,4,5,6]) ...: Out[1]: [[1, 6, 4, 2], [5, 3]]
What is the most concise way to get a non-shuffled train/test split, i.e.
[[1,2,3,4], [5,6]]
EDIT Currently I am using
train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):]
but hoping for something a little nicer. I have opened an issue on sklearn https://github.com/scikit-learn/scikit-learn/issues/8844
EDIT 2: My PR has been merged, in scikit-learn version 0.19, you can pass the parameter shuffle=False
to train_test_split
to obtain a non-shuffled split.
Scikit-learn has the TimeSeriesSplit functionality for this. The shuffle parameter is needed to prevent non-random assignment to to train and test set. With shuffle=True you split the data randomly.
The train_test_split() function is provided by the scikit-learn Python package. Usually, we do not care much about the effects of using this function, because with a single line of code we obtain the division of the dataset into two parts, train and test set. Indeed, using this function could be dangerous.
sklearn. model_selection . train_test_split. Split arrays or matrices into random train and test subsets.
cross_validation. train_test_split. Quick utility that wraps calls to check_arrays and next(iter(ShuffleSplit(n_samples))) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner. Python lists or tuples occurring in arrays are converted to 1D numpy arrays.
I'm not adding much to Psidom's answer except an easy to copy paste function:
def non_shuffling_train_test_split(X, y, test_size=0.2): i = int((1 - test_size) * X.shape[0]) + 1 X_train, X_test = np.split(X, [i]) y_train, y_test = np.split(y, [i]) return X_train, X_test, y_train, y_test
Update: At some point this feature became built in, so now you can do:
from sklearn.model_selection import train_test_split train_test_split(X, y, test_size=0.2, shuffle=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With