Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to spot gaps between pandas dataframe indexes?

This code creates an dataframe with 10 minute range index:

import pandas as pd 
import datetime as dt 

date_range = pd.date_range(end=dt.datetime(2017, 1, 6, 15, 00), periods=10, freq='10Min')

df = pd.DataFrame(index=date_range)
df['A'] = 1

print(df)

It outputs:

                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:00:00  1
2017-01-06 14:10:00  1
2017-01-06 14:20:00  1
2017-01-06 14:30:00  1
2017-01-06 14:40:00  1
2017-01-06 14:50:00  1
2017-01-06 15:00:00  1

My question is:

How may I set A column to 0 in the following three rows when there is a gap between the indexes?

For example, if we delete an specific row:

df = df[df.index != dt.datetime(2017, 1, 6, 14, 00)]

It outputs:

                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:10:00  1
2017-01-06 14:20:00  1
2017-01-06 14:30:00  1
2017-01-06 14:40:00  1
2017-01-06 14:50:00  1
2017-01-06 15:00:00  1

Now, there is a missing 10 minute range before 13:50, so the following 3 A rows must be setted to 0.

So this would be the desired result:

                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:10:00  0
2017-01-06 14:20:00  0
2017-01-06 14:30:00  0
2017-01-06 14:40:00  1
2017-01-06 14:50:00  1
2017-01-06 15:00:00  1

There is a python fiddle so you can try: https://repl.it/FaXZ/2

like image 945
mllamazares Avatar asked Mar 02 '17 11:03

mllamazares


2 Answers

You can use:

#get mask where difference
mask = df.index.to_series().diff() > pd.Timedelta('00:10:00')
#get position of index where True in mask
idx = mask.idxmax()
pos = df.index.get_loc(idx)
#add values by position
df.A.iloc[pos:pos + 2] = 0
print (df)
                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:10:00  0
2017-01-06 14:20:00  0
2017-01-06 14:30:00  1
2017-01-06 14:40:00  1
2017-01-06 14:50:00  1
2017-01-06 15:00:00  1

df.A.iloc[pos:pos + 5] = 0
print (df)
                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:10:00  0
2017-01-06 14:20:00  0
2017-01-06 14:30:00  0
2017-01-06 14:40:00  0
2017-01-06 14:50:00  0
2017-01-06 15:00:00  1
like image 136
jezrael Avatar answered Sep 23 '22 00:09

jezrael


temp = df.index.to_series().diff() > pd.Timedelta('00:10:00')
df['A'] = 1- (temp | temp.shift(1)).astype(int)

will result in

                     A
2017-01-06 13:30:00  1
2017-01-06 13:40:00  1
2017-01-06 13:50:00  1
2017-01-06 14:10:00  0
2017-01-06 14:20:00  0
2017-01-06 14:30:00  1
2017-01-06 14:40:00  1
2017-01-06 14:50:00  1
2017-01-06 15:00:00  1
like image 44
languitar Avatar answered Sep 19 '22 00:09

languitar