Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create test and train samples from one dataframe with pandas?

I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.

Thanks!

like image 759
tooty44 Avatar asked Jun 10 '14 17:06

tooty44


People also ask

How do you concatenate train and test data?

If you insist on concatenating the two dataframes, then first add a new column to each DataFrame called source . Make the value for test. csv 'test' and likewise for the training set. When you have finished cleaning the combined df , then use the source column to split the data again.


1 Answers

Scikit Learn's train_test_split is a good one. It will split both numpy arrays and dataframes.

from sklearn.model_selection import train_test_split  train, test = train_test_split(df, test_size=0.2) 
like image 178
o-90 Avatar answered Oct 18 '22 21:10

o-90