Splitting data using time-based splitting in test and train datasets

Tags:

I know that train_test_split splits it randomly, but I need to know how to split it based on time.

  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) 
  # this splits the data randomly as 67% test and 33% train

How to split the same data set based on time as 67% train and 33% test? The dataset has a column TimeStamp.

I tried searching on the similar questions but was not sure about the approach.

Can someone explain briefly?

722

asked Jun 15 '18 17:06

dhruv bhardwaj

1 Answers

One easy way to do it..

First: sort the data by time

Second:

import numpy as np 
train_set, test_set= np.split(data, [int(.67 *len(data))])

That makes the train_set with the first 67% of the data, and the test_set with rest 33% of the data.

answered Sep 23 '22 07:09

zetadaro

Related questions
                            
                                cluster points after KMeans clustering (scikit learn)
                            
                                Is it valid to use conditional expressions for side effects?
                            
                                Building a RESTful Flask API for Scrapy
                            
                                Python round to nearest 0.25 [duplicate]
                            
                                Python string literal concatenation
                            
                                Error while import pygame
                            
                                Need count of negative values in a dataframe
                            
                                Creating a Pandas rolling-window series of arrays
                            
                                How do I print only the first 10 lines from a csv file using Python?
                            
                                Excel export with Flask server and xlsxwriter
                            
                                How to update data of one column for all rows in SqlAlchemy?
                            
                                How to get rows affected in a UPDATE statement by PyMySQL?
                            
                                How to copy and convert parquet files to csv
                            
                                How to stop a Python script but keep interpreter going
                            
                                ipywidget interactive hiding visibility
                            
                                ValueError: Number of features of the model must match the input
                            
                                self modifying python script
                            
                                .app made using PyInstaller closes straight away?
                            
                                How do I uncurry a function in Python?
                            
                                Django Rest Framework - Post Foreign Key

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting data using time-based splitting in test and train datasets

Tags:

python

timestamp

scikit-learn

train-test-split

dhruv bhardwaj

People also ask

1 Answers

zetadaro

Recent Activity

Donate For Us