expanding a dataframe based on start and end columns (speed)

Tags:

I have a pandas.DataFrame containing start and end columns, plus a couple of additional columns. I would like to expand this dataframe into a time series that starts at start values and end at end values, but copying my other columns. So far I came up with the following:

import pandas as pd
import datetime as dt

df = pd.DataFrame()
df['start'] = [dt.datetime(2017, 4, 3), dt.datetime(2017, 4, 5), dt.datetime(2017, 4, 10)]
df['end'] = [dt.datetime(2017, 4, 10), dt.datetime(2017, 4, 12), dt.datetime(2017, 4, 17)]
df['country'] = ['US', 'EU', 'UK']
df['letter'] = ['a', 'b', 'c']

data_series = list()
for row in df.itertuples():
    time_range = pd.bdate_range(row.start, row.end)
    s = len(time_range)
    data_series += (zip(time_range, [row.start]*s, [row.end]*s, [row.country]*s, [row.letter]*s))

columns_names = ['date', 'start', 'end', 'country', 'letter']
df = pd.DataFrame(data_series, columns=columns_names)

Starting Dataframe:

       start        end country letter
0 2017-04-03 2017-04-10      US      a
1 2017-04-05 2017-04-12      EU      b
2 2017-04-10 2017-04-17      UK      c

Desired output:

         date      start        end country letter
0  2017-04-03 2017-04-03 2017-04-10      US      a
1  2017-04-04 2017-04-03 2017-04-10      US      a
2  2017-04-05 2017-04-03 2017-04-10      US      a
3  2017-04-06 2017-04-03 2017-04-10      US      a
4  2017-04-07 2017-04-03 2017-04-10      US      a
5  2017-04-10 2017-04-03 2017-04-10      US      a
6  2017-04-05 2017-04-05 2017-04-12      EU      b
7  2017-04-06 2017-04-05 2017-04-12      EU      b
8  2017-04-07 2017-04-05 2017-04-12      EU      b
9  2017-04-10 2017-04-05 2017-04-12      EU      b
10 2017-04-11 2017-04-05 2017-04-12      EU      b
11 2017-04-12 2017-04-05 2017-04-12      EU      b
12 2017-04-10 2017-04-10 2017-04-17      UK      c
13 2017-04-11 2017-04-10 2017-04-17      UK      c
14 2017-04-12 2017-04-10 2017-04-17      UK      c
15 2017-04-13 2017-04-10 2017-04-17      UK      c
16 2017-04-14 2017-04-10 2017-04-17      UK      c
17 2017-04-17 2017-04-10 2017-04-17      UK      c

Problem with my solution is that when applying it to a much bigger dataframe (mostly in terms of rows), it does not achieve a result fast enough for me. Does anybody have any ideas of how I could improve? I am also considering solutions in numpy.

397

asked May 07 '17 14:05

Eric B

1 Answers

Inspired by @StephenRauch's solution I'd like to post mine (which is pretty similar):

dates = [pd.bdate_range(r[0],r[1]).to_series() for r in df[['start','end']].values]
lens = [len(x) for x in dates]

r = pd.DataFrame(
        {col:np.repeat(df[col].values, lens) for col in df.columns}
    ).assign(date=np.concatenate(dates))

Result:

In [259]: r
Out[259]:
   country        end letter      start       date
0       US 2017-04-10      a 2017-04-03 2017-04-03
1       US 2017-04-10      a 2017-04-03 2017-04-04
2       US 2017-04-10      a 2017-04-03 2017-04-05
3       US 2017-04-10      a 2017-04-03 2017-04-06
4       US 2017-04-10      a 2017-04-03 2017-04-07
5       US 2017-04-10      a 2017-04-03 2017-04-10
6       EU 2017-04-12      b 2017-04-05 2017-04-05
7       EU 2017-04-12      b 2017-04-05 2017-04-06
8       EU 2017-04-12      b 2017-04-05 2017-04-07
9       EU 2017-04-12      b 2017-04-05 2017-04-10
10      EU 2017-04-12      b 2017-04-05 2017-04-11
11      EU 2017-04-12      b 2017-04-05 2017-04-12
12      UK 2017-04-17      c 2017-04-10 2017-04-10
13      UK 2017-04-17      c 2017-04-10 2017-04-11
14      UK 2017-04-17      c 2017-04-10 2017-04-12
15      UK 2017-04-17      c 2017-04-10 2017-04-13
16      UK 2017-04-17      c 2017-04-10 2017-04-14
17      UK 2017-04-17      c 2017-04-10 2017-04-17

139

answered Sep 27 '22 18:09

MaxU - stop WAR against UA

Related questions
                            
                                PyHook on python 3.5
                            
                                How to Get an Average Pixel Value of a Gray Scale Image in Python Using PIL\Numpy?
                            
                                Flask & WTForms: How to make a form with multiple submit buttons? [duplicate]
                            
                                How to create a dynamic list in Python? [closed]
                            
                                Python requests view before sending
                            
                                How to delete empty lines from a .txt file [duplicate]
                            
                                PyQt5 draggable frameless window
                            
                                PyQt - QFileDialog - directly browse to a folder?
                            
                                Element-wise minimum of multiple vectors in numpy
                            
                                Python and Node.js on Heroku
                            
                                Django rest framework pagination with custom API view
                            
                                Google Cloud Storage HttpAccessTokenRefreshError: invalid_grant: Bad Request
                            
                                converting two digit integer into single digit inside a python list?
                            
                                Why does ast.literal_eval('5 * 7') fail?
                            
                                Outlook using python win32com to iterate subfolders
                            
                                Find count of characters within the string in Python
                            
                                ImportError: No module named geopandas
                            
                                closing session in tensorflow doesn't reset graph
                            
                                Python (Pandas) Add subtotal on each lvl of multiindex dataframe
                            
                                pip install pickle not working - no such file or directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

expanding a dataframe based on start and end columns (speed)

Tags:

python

pandas

numpy

Eric B

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us