Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

expanding a dataframe based on start and end columns (speed)

I have a pandas.DataFrame containing start and end columns, plus a couple of additional columns. I would like to expand this dataframe into a time series that starts at start values and end at end values, but copying my other columns. So far I came up with the following:

import pandas as pd
import datetime as dt

df = pd.DataFrame()
df['start'] = [dt.datetime(2017, 4, 3), dt.datetime(2017, 4, 5), dt.datetime(2017, 4, 10)]
df['end'] = [dt.datetime(2017, 4, 10), dt.datetime(2017, 4, 12), dt.datetime(2017, 4, 17)]
df['country'] = ['US', 'EU', 'UK']
df['letter'] = ['a', 'b', 'c']

data_series = list()
for row in df.itertuples():
    time_range = pd.bdate_range(row.start, row.end)
    s = len(time_range)
    data_series += (zip(time_range, [row.start]*s, [row.end]*s, [row.country]*s, [row.letter]*s))

columns_names = ['date', 'start', 'end', 'country', 'letter']
df = pd.DataFrame(data_series, columns=columns_names)

Starting Dataframe:

       start        end country letter
0 2017-04-03 2017-04-10      US      a
1 2017-04-05 2017-04-12      EU      b
2 2017-04-10 2017-04-17      UK      c

Desired output:

         date      start        end country letter
0  2017-04-03 2017-04-03 2017-04-10      US      a
1  2017-04-04 2017-04-03 2017-04-10      US      a
2  2017-04-05 2017-04-03 2017-04-10      US      a
3  2017-04-06 2017-04-03 2017-04-10      US      a
4  2017-04-07 2017-04-03 2017-04-10      US      a
5  2017-04-10 2017-04-03 2017-04-10      US      a
6  2017-04-05 2017-04-05 2017-04-12      EU      b
7  2017-04-06 2017-04-05 2017-04-12      EU      b
8  2017-04-07 2017-04-05 2017-04-12      EU      b
9  2017-04-10 2017-04-05 2017-04-12      EU      b
10 2017-04-11 2017-04-05 2017-04-12      EU      b
11 2017-04-12 2017-04-05 2017-04-12      EU      b
12 2017-04-10 2017-04-10 2017-04-17      UK      c
13 2017-04-11 2017-04-10 2017-04-17      UK      c
14 2017-04-12 2017-04-10 2017-04-17      UK      c
15 2017-04-13 2017-04-10 2017-04-17      UK      c
16 2017-04-14 2017-04-10 2017-04-17      UK      c
17 2017-04-17 2017-04-10 2017-04-17      UK      c

Problem with my solution is that when applying it to a much bigger dataframe (mostly in terms of rows), it does not achieve a result fast enough for me. Does anybody have any ideas of how I could improve? I am also considering solutions in numpy.

like image 397
Eric B Avatar asked May 07 '17 14:05

Eric B


People also ask

How do you make a row into a column in pandas?

Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied.

How do you convert a column into a row in Python?

Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of pandas. DataFrame . Neither method changes the original object but returns a new object with the rows and columns swapped (= transposed object).


1 Answers

Inspired by @StephenRauch's solution I'd like to post mine (which is pretty similar):

dates = [pd.bdate_range(r[0],r[1]).to_series() for r in df[['start','end']].values]
lens = [len(x) for x in dates]

r = pd.DataFrame(
        {col:np.repeat(df[col].values, lens) for col in df.columns}
    ).assign(date=np.concatenate(dates))

Result:

In [259]: r
Out[259]:
   country        end letter      start       date
0       US 2017-04-10      a 2017-04-03 2017-04-03
1       US 2017-04-10      a 2017-04-03 2017-04-04
2       US 2017-04-10      a 2017-04-03 2017-04-05
3       US 2017-04-10      a 2017-04-03 2017-04-06
4       US 2017-04-10      a 2017-04-03 2017-04-07
5       US 2017-04-10      a 2017-04-03 2017-04-10
6       EU 2017-04-12      b 2017-04-05 2017-04-05
7       EU 2017-04-12      b 2017-04-05 2017-04-06
8       EU 2017-04-12      b 2017-04-05 2017-04-07
9       EU 2017-04-12      b 2017-04-05 2017-04-10
10      EU 2017-04-12      b 2017-04-05 2017-04-11
11      EU 2017-04-12      b 2017-04-05 2017-04-12
12      UK 2017-04-17      c 2017-04-10 2017-04-10
13      UK 2017-04-17      c 2017-04-10 2017-04-11
14      UK 2017-04-17      c 2017-04-10 2017-04-12
15      UK 2017-04-17      c 2017-04-10 2017-04-13
16      UK 2017-04-17      c 2017-04-10 2017-04-14
17      UK 2017-04-17      c 2017-04-10 2017-04-17
like image 139
MaxU - stop WAR against UA Avatar answered Sep 27 '22 18:09

MaxU - stop WAR against UA