I have following dataframe having a timestamp and value. Timestamp increases by 5 seconds and notice that there are missing records between 23:02:02 & 23:06:32.
Is there a simple way to detect if there are missing records between timestamps?
timestamp   value
23:01:27    2915
23:01:32    2916
23:01:37    2919
23:01:42    2924
23:01:47    2926
23:01:52    2928
23:01:57    2933
23:02:02    2937 # <- missing timestamp
23:06:32    3102 # <- between these lines
23:06:37    3109
23:06:42    3114
23:06:47    3122
23:06:52    3126
23:06:57    3129
If your goal is to indicate where you are missing timestamps, you can convert to datetime and use diff to see the time difference between rows, then  use >'00:00:05' to see if the gap is greater than 5 seconds:
>>> pd.to_datetime(df['timestamp']).diff() > '00:00:05'
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
Name: timestamp, dtype: bool
This indicates you are missing records above index 8
If your goal is simply to see whether you are missing timestamps, use any:
>>> (pd.to_datetime(df['timestamp']).diff() > '00:00:05').any()
True
Indicating that you are indeed missing timestamps somewhere
[EDIT] as per @JoranBeasley's suggestion, you can also use the mode of your time differences to infer the desired frequency:
d = pd.to_datetime(df['timestamp']).diff()
>>> (d > d.mode()[0])
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
Name: timestamp, dtype: bool
Because d.mode()[0] will return the most common frequency observed:
>>> d.mode()[0]
Timedelta('0 days 00:00:05')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With