Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating new rows in df based on date range

Tags:

python

pandas

Below is script for a simplified version of the df in question:

import pandas as pd
import numpy as np
from datetime import date
from datetime import datetime

df = pd.DataFrame({'date':pd.date_range(start='2020-01-01', end='2020/01/07'), 
                   'id' : range(1,8), 
                   'product': ['list_3','list_1','list_2', 'list_3','list_2','list_1','list_1'],
                  'duration' : [3,1,2,3,2,1,1],
                  'product_in_use': ('true'),
                  'end_date':['2020-01-03','2020-01-02','2020-01-04','2020-01-06','2020-01-06','2020-01-06',
                                '2020-01-07']})

df['duration']= pd.to_timedelta(df['duration'], unit='D')
df['date'] = pd.to_datetime(df['date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df

df:

    date       id   product duration product_in_use end_date
0   2020-01-01  1   list_3  3 days     true        2020-01-03
1   2020-01-02  2   list_1  1 days     true        2020-01-02
2   2020-01-03  3   list_2  2 days     true        2020-01-04
3   2020-01-04  4   list_3  3 days     true        2020-01-06
4   2020-01-05  5   list_2  2 days     true        2020-01-06
5   2020-01-06  6   list_1  1 days     true        2020-01-06
6   2020-01-07  7   list_1  1 days     true        2020-01-07

As you can see in the df above, each id is using a product and each product lasts for a particular duration. There are no rows for the days when the product is in action, only rows for when the product was purchased by the user.

Therefore, I would like to create new rows, for all dates where the product is in use for each id.

My intended df would therefore be this:

    date       id   product  duration   product_in_use
0   2020-01-01  1   list_3   3 days         true
1   2020-01-02  1   list_3   3 days         true
2   2020-01-03  1   list_3   3 days         true
3   2020-01-02  2   list_1   1 days         true
4   2020-01-03  3   list_2   2 days         true
5   2020-01-04  3   list_2   2 days         true
6   2020-01-04  4   list_3   3 days         true
7   2020-01-05  4   list_3   3 days         true
8   2020-01-06  4   list_3   3 days         true
9   2020-01-05  5   list_2   3 days         true
10  2020-01-06  5   list_2   2 days         true
11  2020-01-06  6   list_1   2 days         true
12  2020-01-07  7   list_1   1 days         true
like image 916
LeoGER Avatar asked Sep 11 '25 17:09

LeoGER


1 Answers

create date ranges from the start date to the end date for each id, using starmap and chain, expand the main dataframe by the duration, and assign the new dates as index of the dataframe.

from itertools import starmap,chain

#create date ranges from date to end_date for each id
start_end = zip(df.date.array,df.end_date.array)
date_ranges = starmap(pd.date_range,start_end)
date_ranges = chain.from_iterable(date_ranges)

#get all columns except date and end_date
res = df.filter(['id','product','duration','product_in_use'])

#expand the dataframe by repeating the indexes based on the duration
#so index 0 will be repeated 3 times, 1 once, 2 twice, ...
res = res.reindex(res.index.repeat(res.duration.dt.days))

#assign the new date_ranges to the dataframe
res.index = date_ranges
res

           id   product duration    product_in_use
2020-01-01  1   list_3    3 days    true
2020-01-02  1   list_3    3 days    true
2020-01-03  1   list_3    3 days    true
2020-01-02  2   list_1    1 days    true
2020-01-03  3   list_2    2 days    true
2020-01-04  3   list_2    2 days    true
2020-01-04  4   list_3    3 days    true
2020-01-05  4   list_3    3 days    true
2020-01-06  4   list_3    3 days    true
2020-01-05  5   list_2    2 days    true
2020-01-06  5   list_2    2 days    true
2020-01-06  6   list_1    1 days    true
2020-01-07  7   list_1    1 days    true
like image 115
sammywemmy Avatar answered Sep 14 '25 07:09

sammywemmy