I have a pandas dataframe with two columns, a date column and an int column, and I'd simply like to add the int column (in days) to the date column. I found a solution using df.apply(), but that was too slow on my full dataset. I don't see a ton of documentation on doing this in a vectorized manner (the closest I could find was this ), so I wanted to make sure the solution I found was the best way to go forward.
My raw data is just a column of strings as a column of ints (days).
import pandas as pd
from datetime import timedelta
df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                  columns = ['ship_string','days_supply'])
print df  
ship_string  days_supply
0  2016-01-10           28
1  2016-05-11           28
2  2016-02-23           15
3  2015-12-08           30
My first thought (which worked) was to use .apply as follows:
def f(x):    
    return x['ship_date'] + timedelta(days=x['days_supply'] )
df['ship_date'] = pd.to_datetime(df['ship_string'])
df['supply_ended'] = df.apply(f,axis = 1)
That worked, but is exceedingly slow. I've posted my alternate solution below as an answer to the question, but I'd like to get confirmation that it is "best practice". I couldn't find many good threads on adding timedelta columns to dates in pandas (especially in a vectorized manner), so thought I'd add one that is a little bit more user friendly and hopefully it will help the next poor soul trying to do this.
Full code solution:
import pandas as pd
from datetime import timedelta
df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                      columns = ['ship_string','days_supply'])
df['ship_date'] = pd.to_datetime(df['ship_string'])
df['time_added'] = pd.to_timedelta(df['days_supply'],'d')
df['supply_ended'] = df['ship_date'] + df['time_added']
print df
  ship_string  days_supply  ship_date  time_added supply_ended
0  2016-01-10           28 2016-01-10     28 days   2016-02-07
1  2016-05-11           28 2016-05-11     28 days   2016-06-08
2  2016-02-23           15 2016-02-23     15 days   2016-03-09
3  2015-12-08           30 2015-12-08     30 days   2016-01-07
Please let me know in the comments below if this isn't a good vectorized solution and i'll edit.
df['supply_ended'] = df['ship_date']) + pd.to_timedelta(df['days_supply'],'D')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With