Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split dataframe into two on the basis of date

Tags:

python

dataset

I have dataset with 1000 rows like this

 Date,      Cost,         Quantity(in ton),    Source,          Unloading Station
    01/10/2015, 7,            5.416,               XYZ,           ABC

i want to split the data on the base of date. For e.g. till date 20.12.2016 is a training data and after that it is test data.

How should i split? Is it possible?

like image 370
kush Avatar asked May 30 '16 18:05

kush


People also ask

How do you split a data frame into two parts?

Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

How do you divide data frames?

div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).


3 Answers

You can easily do that by converting your column to pandas to_datetime type and set it as index.

import pandas as pd
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(df['Date'])
df = df.sort_index()

Once you have your data in this format, you can simply use date as index for creating partition as follows:

# create train test partition
train = df['2015-01-10':'2016-12-20']
test  = df['2016-12-21':]
print('Train Dataset:',train.shape)
print('Test Dataset:',test.shape)
like image 173
Sayali Sonawane Avatar answered Oct 23 '22 07:10

Sayali Sonawane


assuming that your data set is pandas data frame and that Date column is of datetime dtype:

split_date = pd.datetime(2016,12,20)

df_training = df.loc[df['Date'] <= split_date]
df_test = df.loc[df['Date'] > split_date]
like image 27
MaxU - stop WAR against UA Avatar answered Oct 23 '22 07:10

MaxU - stop WAR against UA


If your date is in standard python datetime format ie. '2016-06-23 23:00:00', you can use the code below

split_date ='2016-06-23 23:00:00' train_data = train_data.loc[train_data['Date'] <= split_date] validation_data = train_data.loc[train_data['Date'] > split_date]

like image 1
Kaustubh Kulkarni Avatar answered Oct 23 '22 05:10

Kaustubh Kulkarni