I have following dataframe having a timestamp and value. Timestamp increases by 5 seconds and notice that there are missing records between 23:02:02 & 23:06:32.
Is there a simple way to detect if there are missing records between timestamps?
timestamp value
23:01:27 2915
23:01:32 2916
23:01:37 2919
23:01:42 2924
23:01:47 2926
23:01:52 2928
23:01:57 2933
23:02:02 2937 # <- missing timestamp
23:06:32 3102 # <- between these lines
23:06:37 3109
23:06:42 3114
23:06:47 3122
23:06:52 3126
23:06:57 3129
If your goal is to indicate where you are missing timestamps, you can convert to datetime and use diff to see the time difference between rows, then use >'00:00:05' to see if the gap is greater than 5 seconds:
>>> pd.to_datetime(df['timestamp']).diff() > '00:00:05'
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 True
9 False
10 False
11 False
12 False
13 False
Name: timestamp, dtype: bool
This indicates you are missing records above index 8
If your goal is simply to see whether you are missing timestamps, use any:
>>> (pd.to_datetime(df['timestamp']).diff() > '00:00:05').any()
True
Indicating that you are indeed missing timestamps somewhere
[EDIT] as per @JoranBeasley's suggestion, you can also use the mode of your time differences to infer the desired frequency:
d = pd.to_datetime(df['timestamp']).diff()
>>> (d > d.mode()[0])
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 True
9 False
10 False
11 False
12 False
13 False
Name: timestamp, dtype: bool
Because d.mode()[0] will return the most common frequency observed:
>>> d.mode()[0]
Timedelta('0 days 00:00:05')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With