Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas multiple date ranges from column of dates

Current df:

ID  Date
11  3/19/2018
22  1/5/2018
33  2/12/2018
..  ..

I have the df with ID and Date. ID is unique in the original df. I would like to create a new df based on date. Each ID has a Max Date, I would like to use that date and go back 4 days(5 rows each ID) There are thousands of IDs.

Expect to get:

ID  Date
11  3/15/2018
11  3/16/2018
11  3/17/2018
11  3/18/2018
11  3/19/2018
22  1/1/2018
22  1/2/2018
22  1/3/2018
22  1/4/2018
22  1/5/2018
33  2/8/2018
33  2/9/2018
33  2/10/2018
33  2/11/2018
33  2/12/2018
…   …

I tried the following method, i think use date_range might be right direction, but I keep get error.

pd.date_range

def date_list(row):
    list = pd.date_range(row["Date"], periods=5)
    return list

df["Date_list"] = df.apply(date_list, axis = "columns")
like image 395
qqqwww Avatar asked May 16 '18 21:05

qqqwww


2 Answers

Here is another by using df.assign to overwrite date and pd.concat to glue the range together. cᴏʟᴅsᴘᴇᴇᴅ's solution wins in performance but I think this might be a nice addition as it is quite easy to read and understand.

df = pd.concat([df.assign(Date=df.Date - pd.Timedelta(days=i)) for i in range(5)])

Alternative:

dates = (pd.date_range(*x) for x in zip(df['Date']-pd.Timedelta(days=4), df['Date']))

df = (pd.DataFrame(dict(zip(df['ID'],dates)))
        .T
        .stack()
        .reset_index(0)
        .rename(columns={'level_0': 'ID', 0: 'Date'}))

Full example:

import pandas as pd

data = '''\
ID  Date
11  3/19/2018
22  1/5/2018
33  2/12/2018'''

# Recreate dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
df['Date']= pd.to_datetime(df.Date)

df = pd.concat([df.assign(Date=df.Date - pd.Timedelta(days=i)) for i in range(5)])
df.sort_values(by=['ID','Date'], ascending = [True,True], inplace=True)
print(df)

Returns:

   ID       Date
0  11 2018-03-15
0  11 2018-03-16
0  11 2018-03-17
0  11 2018-03-18
0  11 2018-03-19
1  22 2018-01-01
1  22 2018-01-02
1  22 2018-01-03
1  22 2018-01-04
1  22 2018-01-05
2  33 2018-02-08
2  33 2018-02-09
2  33 2018-02-10
2  33 2018-02-11
2  33 2018-02-12
like image 114
Anton vBR Avatar answered Oct 13 '22 14:10

Anton vBR


reindexing with pd.date_range

Let's try creating a flat list of date-ranges and reindexing this DataFrame.

from itertools import chain

v = df.assign(Date=pd.to_datetime(df.Date)).set_index('Date')
# assuming ID is a string column
v.reindex(chain.from_iterable(
    pd.date_range(end=i, periods=5) for i in v.index)
).bfill().reset_index()  

         Date  ID
0  2018-03-14  11
1  2018-03-15  11
2  2018-03-16  11
3  2018-03-17  11
4  2018-03-18  11
5  2018-03-19  11
6  2017-12-31  22
7  2018-01-01  22
8  2018-01-02  22
9  2018-01-03  22
10 2018-01-04  22
11 2018-01-05  22
12 2018-02-07  33
13 2018-02-08  33
14 2018-02-09  33
15 2018-02-10  33
16 2018-02-11  33
17 2018-02-12  33

concat based solution on keys

Just for fun. My reindex solution is definitely more performant and easier to read, so if you were to pick one, use that.

v = df.assign(Date=pd.to_datetime(df.Date))
v_dict = {
    j : pd.DataFrame(
            pd.date_range(end=i, periods=5), columns=['Date']
        ) 
    for j, i in zip(v.ID, v.Date)
}

(pd.concat(v_dict, axis=0)
  .reset_index(level=1, drop=True)
  .rename_axis('ID')
  .reset_index()
)

    ID       Date
0   11 2018-03-14
1   11 2018-03-15
2   11 2018-03-16
3   11 2018-03-17
4   11 2018-03-18
5   11 2018-03-19
6   22 2017-12-31
7   22 2018-01-01
8   22 2018-01-02
9   22 2018-01-03
10  22 2018-01-04
11  22 2018-01-05
12  33 2018-02-07
13  33 2018-02-08
14  33 2018-02-09
15  33 2018-02-10
16  33 2018-02-11
17  33 2018-02-12
like image 35
cs95 Avatar answered Oct 13 '22 12:10

cs95