Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search Missing Timestamp and display in python?

Tags:

python

pandas

Here is my some Dataset which having Time,Temperature1,Temperature2

Timestamp.             Temperature1.        Temperature2
09/01/2016 00:00:08          53.4.                       45.5
09/01/2016 00:00:38.         53.5.                       45.2
09/01/2016 00:01:08.         54.6.                        43.2
09/01/2016 00:01:38.         55.2.                        46.3
09/01/2016 00:02:08.         54.5.                        45.5
09/01/2016 00:04:08.         54.2.                       35.5
09/01/2016 00:05:08.         52.4.                        45.7
09/01/2016 00:05:38.         53.4.                         45.2

My data is coming in every 30 second..

This is my dataset here..some Timestamp is missing..bcoz. every 30 second my data is coming..so some data points are missing.. How to find that data points..and insert there data as NAN ... Please help me..

like image 760
TB.M Avatar asked Oct 21 '25 16:10

TB.M


2 Answers

you can use resample('30S', base=8) method:

In [20]: x.resample('30S', base=8).mean()
Out[20]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

The solution above assumes that the Timestamp is of datetime dtype and that it has been set as index. If Timestamp is a regular column (not index), then starting from Pandas 0.19.0 we can resample on regular columns (it must be of datetime dtype), using on='column_name' parameter:

In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

if you need to find your base value dynamically you can do it this way:

In [21]: x.index[0].second
Out[21]: 8

from docs:

base : int, default 0

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for 5min frequency, base could range from 0 through 4.

Defaults to 0

like image 74
MaxU - stop WAR against UA Avatar answered Oct 23 '25 06:10

MaxU - stop WAR against UA


Assuming that the timestamps have been converted to datetime, if you set the index to the timestamp column and then reindex with a date range then the missing values will show up:

In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df

Out[94]:
                     Temperature1  Temperature2
Timestamp                                      
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

In [96]:    
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))

Out[96]:
                     Temperature1  Temperature2
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:02:38           NaN           NaN
2016-09-01 00:03:08           NaN           NaN
2016-09-01 00:03:38           NaN           NaN
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:04:38           NaN           NaN
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

This assumes that the timestamps are regular, here we construct a date range using the timestamp first and last values with a frequency of 30 seconds:

In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')

Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
               '2016-09-01 00:01:08', '2016-09-01 00:01:38',
               '2016-09-01 00:02:08', '2016-09-01 00:02:38',
               '2016-09-01 00:03:08', '2016-09-01 00:03:38',
               '2016-09-01 00:04:08', '2016-09-01 00:04:38',
               '2016-09-01 00:05:08', '2016-09-01 00:05:38'],
              dtype='datetime64[ns]', freq='30S')

When you reindex with this, any missing index labels become NaN values

like image 28
EdChum Avatar answered Oct 23 '25 06:10

EdChum