TypeError: Singleton array array(0.2) cannot be considered a valid collection.
X = df.iloc[:, [1,7]].values
y= df.iloc[:,-1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 0.2)
I am getting this error when trying to train_test_split. I am able to train my model with X and y values. However, i would like to split my dataframe and then train and test it.
Any help is appreciated.
A not-so-commonly known fact is that train_test_split
can split any number of arrays, not just two ("train", and "test"). See the linked docs and the source code for more info.
For example,
np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
y = df1.pop('C')
z = df1.pop('D')
X = df1
splits = train_test_split(X, y, z, test_size=0.2)
len(splits)
# 6
IOW, the only way to specify the test size is by specifying the keyword argument test_size
. All positional arguments are assumed to be collections that are to be split, and in your case, since you do
train_test_split(X, y, 0.2)
The function tries to split 0.2
, but since a float is not a collection, the error is raised. The solution is to (as mentioned), specify the keyword argument:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With