Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to "unroll" time intervals in a dataframe?

Tags:

python

pandas

I have a dataframe:

df1 = pd.DataFrame(
    [['2011-01-01','2011-01-03','A'], ['2011-04-01','2011-04-01','A'], ['2012-08-28','2012-08-30','B'], ['2015-04-03','2015-04-05','A'], ['2015-08-21','2015-08-21','B']],
    columns=['d0', 'd1', 'event'])
           d0          d1 event
0  2011-01-01  2011-01-03     A
1  2011-04-01  2011-04-01     A
2  2012-08-28  2012-08-30     B
3  2015-04-03  2015-04-05     A
4  2015-08-21  2015-08-21     B

It contains some events A and B that occurred in the specified interval from d0 to d1. (There are actually more events, they are mixed, but they have no intersection by dates.) Moreover, this interval can be 1 day (d0 = d1). I need to go from df1 to df2 in which these time intervals are "unrolled" for each event, i.e.:

df2 = pd.DataFrame(
    [['2011-01-01','A'], ['2011-01-02','A'], ['2011-01-03','A'], ['2011-04-01','A'], ['2012-08-28','B'], ['2012-08-29','B'], ['2012-08-30','B'], ['2015-04-03','A'], ['2015-04-04','A'], ['2015-04-05','A'], ['2015-08-21','B']],
    columns=['Date', 'event'])
          Date event
0   2011-01-01     A
1   2011-01-02     A
2   2011-01-03     A
3   2011-04-01     A
4   2012-08-28     B
5   2012-08-29     B
6   2012-08-30     B
7   2015-04-03     A
8   2015-04-04     A
9   2015-04-05     A
10  2015-08-21     B

I tried doing this based on resample and comparing areas where ffill = bfill but couldn't come up with anything. How can this be done in the most simple way?

like image 321
Антон Avatar asked Oct 31 '21 12:10

Антон


3 Answers

We can set_index to event then create date_range per row, then explode to unwind the ranges and reset_index to create the DataFrame:

df2 = (
    df1.set_index('event')
        .apply(lambda r: pd.date_range(r['d0'], r['d1']), axis=1)
        .explode()
        .reset_index(name='Date')[['Date', 'event']]
)

df2:

         Date event
0  2011-01-01     A
1  2011-01-02     A
2  2011-01-03     A
3  2011-04-01     A
4  2012-08-28     B
5  2012-08-29     B
6  2012-08-30     B
7  2015-04-03     A
8  2015-04-04     A
9  2015-04-05     A
10 2015-08-21     B
like image 86
Henry Ecker Avatar answered Oct 21 '22 14:10

Henry Ecker


Let us try comprehension to create the pairs of date and event

pd.DataFrame(((d, c) for (*v, c) in df1.to_numpy()
              for d in pd.date_range(*v)), columns=['Date', 'Event'])

         Date Event
0  2011-01-01     A
1  2011-01-02     A
2  2011-01-03     A
3  2011-04-01     A
4  2012-08-28     B
5  2012-08-29     B
6  2012-08-30     B
7  2015-04-03     A
8  2015-04-04     A
9  2015-04-05     A
10 2015-08-21     B
like image 6
Shubham Sharma Avatar answered Oct 21 '22 12:10

Shubham Sharma


I don't know if this is the "most simple," but it's the most intuitive way I can think to do it. I iterate over the rows and unroll it manually into a new dataframe. This means that I look at each row, iterate over the dates between d0 and d1, and construct a row for each of them and compile them into a dataframe:

from datetime import timedelta

def unroll_events(df):
    rows = []
    for _, row in df.iterrows():
        event = row['event']
        start = row['d0']
        end = row['d1']
        current = start
        while current != end:
            rows.append(dict(Date=current, event=event))
            current += timedelta(days=1)
        rows.append(dict(Date=current, event=event)) # make sure last one is included
    return pd.DataFrame(rows)
like image 1
dsillman2000 Avatar answered Oct 21 '22 13:10

dsillman2000