I have a pandas.DataFrame
containing start
and end
columns, plus a couple of additional columns. I would like to expand this dataframe into a time series that starts at start
values and end at end
values, but copying my other columns. So far I came up with the following:
import pandas as pd
import datetime as dt
df = pd.DataFrame()
df['start'] = [dt.datetime(2017, 4, 3), dt.datetime(2017, 4, 5), dt.datetime(2017, 4, 10)]
df['end'] = [dt.datetime(2017, 4, 10), dt.datetime(2017, 4, 12), dt.datetime(2017, 4, 17)]
df['country'] = ['US', 'EU', 'UK']
df['letter'] = ['a', 'b', 'c']
data_series = list()
for row in df.itertuples():
time_range = pd.bdate_range(row.start, row.end)
s = len(time_range)
data_series += (zip(time_range, [row.start]*s, [row.end]*s, [row.country]*s, [row.letter]*s))
columns_names = ['date', 'start', 'end', 'country', 'letter']
df = pd.DataFrame(data_series, columns=columns_names)
Starting Dataframe:
start end country letter
0 2017-04-03 2017-04-10 US a
1 2017-04-05 2017-04-12 EU b
2 2017-04-10 2017-04-17 UK c
Desired output:
date start end country letter
0 2017-04-03 2017-04-03 2017-04-10 US a
1 2017-04-04 2017-04-03 2017-04-10 US a
2 2017-04-05 2017-04-03 2017-04-10 US a
3 2017-04-06 2017-04-03 2017-04-10 US a
4 2017-04-07 2017-04-03 2017-04-10 US a
5 2017-04-10 2017-04-03 2017-04-10 US a
6 2017-04-05 2017-04-05 2017-04-12 EU b
7 2017-04-06 2017-04-05 2017-04-12 EU b
8 2017-04-07 2017-04-05 2017-04-12 EU b
9 2017-04-10 2017-04-05 2017-04-12 EU b
10 2017-04-11 2017-04-05 2017-04-12 EU b
11 2017-04-12 2017-04-05 2017-04-12 EU b
12 2017-04-10 2017-04-10 2017-04-17 UK c
13 2017-04-11 2017-04-10 2017-04-17 UK c
14 2017-04-12 2017-04-10 2017-04-17 UK c
15 2017-04-13 2017-04-10 2017-04-17 UK c
16 2017-04-14 2017-04-10 2017-04-17 UK c
17 2017-04-17 2017-04-10 2017-04-17 UK c
Problem with my solution is that when applying it to a much bigger dataframe (mostly in terms of rows), it does not achieve a result fast enough for me. Does anybody have any ideas of how I could improve? I am also considering solutions in numpy.
Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied.
Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of pandas. DataFrame . Neither method changes the original object but returns a new object with the rows and columns swapped (= transposed object).
Inspired by @StephenRauch's solution I'd like to post mine (which is pretty similar):
dates = [pd.bdate_range(r[0],r[1]).to_series() for r in df[['start','end']].values]
lens = [len(x) for x in dates]
r = pd.DataFrame(
{col:np.repeat(df[col].values, lens) for col in df.columns}
).assign(date=np.concatenate(dates))
Result:
In [259]: r
Out[259]:
country end letter start date
0 US 2017-04-10 a 2017-04-03 2017-04-03
1 US 2017-04-10 a 2017-04-03 2017-04-04
2 US 2017-04-10 a 2017-04-03 2017-04-05
3 US 2017-04-10 a 2017-04-03 2017-04-06
4 US 2017-04-10 a 2017-04-03 2017-04-07
5 US 2017-04-10 a 2017-04-03 2017-04-10
6 EU 2017-04-12 b 2017-04-05 2017-04-05
7 EU 2017-04-12 b 2017-04-05 2017-04-06
8 EU 2017-04-12 b 2017-04-05 2017-04-07
9 EU 2017-04-12 b 2017-04-05 2017-04-10
10 EU 2017-04-12 b 2017-04-05 2017-04-11
11 EU 2017-04-12 b 2017-04-05 2017-04-12
12 UK 2017-04-17 c 2017-04-10 2017-04-10
13 UK 2017-04-17 c 2017-04-10 2017-04-11
14 UK 2017-04-17 c 2017-04-10 2017-04-12
15 UK 2017-04-17 c 2017-04-10 2017-04-13
16 UK 2017-04-17 c 2017-04-10 2017-04-14
17 UK 2017-04-17 c 2017-04-10 2017-04-17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With