Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add months to a date in Pandas

I'm trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.

This is what I've tried:

#create dataframe df = pd.DataFrame([pd.Timestamp('20161011'),                    pd.Timestamp('20161101') ], columns=['date'])  #create a future month period plus_month_period = 3  #calculate date + future period df['future_date'] = plus_month_period.astype("timedelta64[M]") 

However, I get the following error:

AttributeError: 'int' object has no attribute 'astype' 
like image 874
mattblack Avatar asked Oct 14 '17 06:10

mattblack


People also ask

How do I add months to pandas?

In pandas, a string is converted to a datetime object using the pd. to_datetime() method and pd. DateOffset() method is used to add months to the created pandas object.


2 Answers

Suppose you have a dataframe of the following format, where you have to add integer months to a date column.

Start_Date Months_to_add
2014-06-01 23
2014-06-01 4
2000-10-01 10
2016-07-01 3
2017-12-01 90
2019-01-01 2

In such a scenario, using Zero's code or mattblack's code won't be useful. You have to use lambda function over the rows where the function takes 2 arguments -

  1. A date to which months need to be added to
  2. A month value in integer format

You can use the following function:

# Importing required modules from dateutil.relativedelta import relativedelta  # Defining the function def add_months(start_date, delta_period):   end_date = start_date + relativedelta(months=delta_period)   return end_date 

After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.

from tqdm import tqdm tqdm.pandas()  df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1) 

Here's the full code form dataset creation, for your reference:

import pandas as pd from dateutil.relativedelta import relativedelta from tqdm import tqdm tqdm.pandas()  # Initilize a new dataframe df = pd.DataFrame()  # Add Start Date column df["Start_Date"] = ['2014-06-01T00:00:00.000000000',                     '2014-06-01T00:00:00.000000000',                     '2000-10-01T00:00:00.000000000',                     '2016-07-01T00:00:00.000000000',                     '2017-12-01T00:00:00.000000000',                     '2019-01-01T00:00:00.000000000']  # To convert the date column to a datetime format df["Start_Date"] = pd.to_datetime(df["Start_Date"]) # Add months column df["Months_to_add"] = [23, 4, 10, 3, 90, 2]  # Defining the Add Months function def add_months(start_date, delta_period):   end_date = start_date + relativedelta(months=delta_period)   return end_date  # Apply function on the dataframe using lambda operation. df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1) 

You will have the final output dataframe as follows.

Start_Date Months_to_add End_Date
2014-06-01 23 2016-05-01
2014-06-01 4 2014-10-01
2000-10-01 10 2001-08-01
2016-07-01 3 2016-10-01
2017-12-01 90 2025-06-01
2019-01-01 2 2019-03-01

Please add to comments if there are any issues with the above code.
All the best!

like image 21
Aruparna Maity Avatar answered Sep 29 '22 20:09

Aruparna Maity


You could use pd.DateOffset

In [1756]: df.date + pd.DateOffset(months=plus_month_period) Out[1756]: 0   2017-01-11 1   2017-02-01 Name: date, dtype: datetime64[ns] 

Another way using pd.offsets.MonthOffset

In [1785]: df.date + pd.offsets.MonthOffset(plus_month_period) Out[1785]: 0   2016-10-14 1   2016-11-04 Name: date, dtype: datetime64[ns] 

Details

In [1757]: df Out[1757]:         date 0 2016-10-11 1 2016-11-01  In [1758]: plus_month_period Out[1758]: 3 
like image 100
Zero Avatar answered Sep 29 '22 19:09

Zero