Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting indices while using train test split in scikit

Tags:

In order to split my data into train and test data separately, I'm using

sklearn.cross_validation.train_test_split function.

When I supply my data and labels as list of lists to this function, it returns train and test data in two separate lists.

I want to get the indices of the train and test data elements from the original data list.

Can anyone help me out with this?

Thanks in advance

like image 225
theegala Avatar asked Feb 25 '16 08:02

theegala


People also ask

How does Scikit-learn train-test split work?

Train-Test Split Procedure in Scikit-Learn. The scikit-learn Python machine learning library provides an implementation of the train-test split evaluation procedure via the train_test_split() function. The function takes a loaded dataset as input and returns the dataset split into two subsets.

What does train_test_split return?

Given two sequences, like x and y here, train_test_split() performs the split and returns four sequences (in this case NumPy arrays) in this order: x_train : The training part of the first sequence ( x ) x_test : The test part of the first sequence ( x ) y_train : The training part of the second sequence ( y )

How do you split dataset into train and test using Sklearn?

The train_test_split() method is used to split our data into train and test sets. First, we need to divide our data into features (X) and labels (y). The dataframe gets divided into X_train,X_test , y_train and y_test.

What is Model_selection in Sklearn?

model_selection package in Python splits arrays or matrices into random subsets for train and test data, respectively.


1 Answers

You can supply the index vector as an additional argument. Using the example from sklearn:

import numpy as np
from sklearn.cross_validation import train_test_split
X, y,indices = (0.1*np.arange(10)).reshape((5, 2)),range(10,15),range(5)
X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(X, y,indices, test_size=0.33, random_state=42)
indices_train,indices_test
#([2, 0, 3], [1, 4])
like image 82
Christian Hirsch Avatar answered Sep 17 '22 11:09

Christian Hirsch