In order to split my data into train and test data separately, I'm using
sklearn.cross_validation.train_test_split
function.
When I supply my data and labels as list of lists to this function, it returns train and test data in two separate lists.
I want to get the indices of the train and test data elements from the original data list.
Can anyone help me out with this?
Thanks in advance
Train-Test Split Procedure in Scikit-Learn. The scikit-learn Python machine learning library provides an implementation of the train-test split evaluation procedure via the train_test_split() function. The function takes a loaded dataset as input and returns the dataset split into two subsets.
Given two sequences, like x and y here, train_test_split() performs the split and returns four sequences (in this case NumPy arrays) in this order: x_train : The training part of the first sequence ( x ) x_test : The test part of the first sequence ( x ) y_train : The training part of the second sequence ( y )
The train_test_split() method is used to split our data into train and test sets. First, we need to divide our data into features (X) and labels (y). The dataframe gets divided into X_train,X_test , y_train and y_test.
model_selection package in Python splits arrays or matrices into random subsets for train and test data, respectively.
You can supply the index vector as an additional argument. Using the example from sklearn:
import numpy as np
from sklearn.cross_validation import train_test_split
X, y,indices = (0.1*np.arange(10)).reshape((5, 2)),range(10,15),range(5)
X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(X, y,indices, test_size=0.33, random_state=42)
indices_train,indices_test
#([2, 0, 3], [1, 4])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With