I have a pandas
dataframe
, with Start
and End
datatime.
df=pd.DataFrame(data=pd.date_range('20100201', periods=10, freq='5h3min'),columns=['Start'])
df.loc[:,'End']=df.loc[:,'Start']+pd.Timedelta(4,'h')
Start
and End
can be expected to be sorted interally, but gaps/overlaps may occur between consecutive rows.
I would like to create a new dataframe with the difference that if row contains midnight (e.g. midnight is contained in [Start
,End
]), the row is then split in two parts before and after midnight
ex:
Start End
0 2010-02-01 00:00:00 2010-02-01 04:00:00
1 2010-02-01 05:03:00 2010-02-01 09:03:00
2 2010-02-01 10:06:00 2010-02-01 14:06:00
3 2010-02-01 15:09:00 2010-02-01 19:09:00
4 2010-02-01 20:12:00 2010-02-02 00:12:00
5 2010-02-02 01:15:00 2010-02-02 05:15:00
should be
Start End
0 2010-02-01 00:00:00 2010-02-01 04:00:00
1 2010-02-01 05:03:00 2010-02-01 09:03:00
2 2010-02-01 10:06:00 2010-02-01 14:06:00
3 2010-02-01 15:09:00 2010-02-01 19:09:00
-----------------------------------------
4 2010-02-01 20:12:00 2010-02-01 23:59:00
5 2010-02-02 00:00:00 2010-02-02 00:12:00
-----------------------------------------
6 2010-02-02 01:15:00 2010-02-02 05:15:00
You can concat the DataFrame of new pairs, then erase the old ones.
First find the splits:
splits = df[df.End.dt.date > df.Start.dt.date].copy()
Now concatenate and drop:
>>> pd.concat([
df,
pd.DataFrame({
'Start': list(splits.Start) + list(splits.End.dt.floor(freq='1D')),
'End': list(splits.Start.dt.ceil(freq='1D')) + list(splits.End)})
]).drop(splits.index).sort_values(by='Start')
End Start
0 2010-02-01 04:00:00 2010-02-01 00:00:00
1 2010-02-01 09:03:00 2010-02-01 05:03:00
2 2010-02-01 14:06:00 2010-02-01 10:06:00
3 2010-02-01 19:09:00 2010-02-01 15:09:00
0 2010-02-02 00:00:00 2010-02-01 20:12:00
2 2010-02-02 00:12:00 2010-02-02 00:00:00
5 2010-02-02 05:15:00 2010-02-02 01:15:00
6 2010-02-02 10:18:00 2010-02-02 06:18:00
7 2010-02-02 15:21:00 2010-02-02 11:21:00
8 2010-02-02 20:24:00 2010-02-02 16:24:00
1 2010-02-03 00:00:00 2010-02-02 21:27:00
3 2010-02-03 01:27:00 2010-02-03 00:00:00
I don't believe the above answer works when a midnight time occurs early in the list. Someone correct me if I'm wrong, but I believe anytime you drop the indexes of the "splits" it is then dropping too much from the original list then as well.
I recognize this isn't answering the above question, but I don't have the reputation to comment above. In my case, I believe I will likely just convert to a numpy array, insert rows where the midnight points are, and then copy data accordingly. Ugly, but should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With