I have dataset with 1000 rows like this
Date, Cost, Quantity(in ton), Source, Unloading Station
01/10/2015, 7, 5.416, XYZ, ABC
i want to split the data on the base of date. For e.g. till date 20.12.2016 is a training data and after that it is test data.
How should i split? Is it possible?
Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.
div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).
You can easily do that by converting your column to pandas to_datetime type and set it as index.
import pandas as pd
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(df['Date'])
df = df.sort_index()
Once you have your data in this format, you can simply use date as index for creating partition as follows:
# create train test partition
train = df['2015-01-10':'2016-12-20']
test = df['2016-12-21':]
print('Train Dataset:',train.shape)
print('Test Dataset:',test.shape)
assuming that your data set is pandas data frame and that Date
column is of datetime
dtype:
split_date = pd.datetime(2016,12,20)
df_training = df.loc[df['Date'] <= split_date]
df_test = df.loc[df['Date'] > split_date]
If your date is in standard python datetime format ie. '2016-06-23 23:00:00', you can use the code below
split_date ='2016-06-23 23:00:00'
train_data = train_data.loc[train_data['Date'] <= split_date]
validation_data = train_data.loc[train_data['Date'] > split_date]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With