Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand rows by date range having start and end in Pandas

I'm working with a data set containing information on a phenomenon occurring during some time frames. I am given the start and end time of the event and its severity, as well as some other information. I would like to expand these frames over some larger time period by expanding the rows within set time periods and leaving the rest of the information as NaNs.

Data set example:

                         date_end         severity   category
     date_start           
2018-01-04 07:00:00  2018-01-04 10:00:00     12          1
2018-01-04 12:00:00  2018-01-04 13:00:00     44          2

What I want is:

                     severity   category
     date_start           
2018-01-04 07:00:00     12         1
2018-01-04 08:00:00     12         1
2018-01-04 09:00:00     12         1
2018-01-04 10:00:00     12         1
2018-01-04 11:00:00     nan       nan
2018-01-04 12:00:00     44         2
2018-01-04 13:00:00     44         2
2018-01-04 14:00:00     nan       nan
2018-01-04 15:00:00     nan       nan

What would be an efficient way of achieving such a result?

like image 433
Aleks-1and Avatar asked Aug 16 '19 13:08

Aleks-1and


People also ask

How do I use date range in pandas?

Specify start and end , with the default daily frequency. Specify start and periods , the number of periods (days). Specify end and periods , the number of periods (days). Specify start , end , and periods ; the frequency is generated automatically (linearly spaced).

How do I select a range of rows in pandas DataFrame?

To select the rows, the syntax is df. loc[start:stop:step] ; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows.

How do you get top 5 rows in pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.

What does .ADD do in pandas?

Pandas DataFrame add() Method The add() method adds each value in the DataFrame with a specified value. The specified value must be an object that can be added to the values of the DataFrame.


1 Answers

Assuming you are on pandas v0.25, use explode:

df['hour'] = df.apply(lambda row: pd.date_range(row.name, row['date_end'], freq='H'), axis=1)
df = df.explode('hour').reset_index() \
        .drop(columns=['date_start', 'date_end']) \
        .rename(columns={'hour': 'date_start'}) \
        .set_index('date_start')

For the rows with nan, you may reindex your dataframe.

# Report from Jan 4 - 5, 2018, from 7AM - 7PM
days = pd.date_range('2018-01-04', '2018-01-05')
hours = pd.to_timedelta(range(7, 20), unit='h')
tmp = pd.MultiIndex.from_product([days, hours], names=['Date', 'Hour']).to_frame()

s = tmp['Date'] + tmp['Hour']
df.reindex(s)
like image 73
Code Different Avatar answered Sep 27 '22 17:09

Code Different