Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify activities in time series in Python

The graph shows water temperature against time. When there is an activation, temperature will increase. When the activation ends, temperature will start decreasing(although sometimes there may be a time lag). enter image description here

I would like to calculate the number of times where there is an event(each blue circle represents one activation). There are times of random noise(red circles - indicates random temperature change, but you can see there is only increases or decrease but not both, implying that it is not a proper event).

Temperature records update for every 0.5°C change in temperature, regardless of time.

I have tried using 1) temperature difference, and 2) temperature change gradient of adjacent data points to identify the event start timestamps and end timestamps, and counting it as one event. But this is not very accurate.

I am told that I should use only the temperature difference and identify the pattern of (increase - max temp - decrease) as one event. Any ideas what is an appropriate way to calculate the total number of activations?


Update1:

Sample Data:

        id      timestamp               temperature 
27581   27822   2020-01-02 07:53:05.173 19.5    
27582   27823   2020-01-02 07:53:05.273 20.0    
27647   27888   2020-01-02 10:01:46.380 20.5    
27648   27889   2020-01-02 10:01:46.480 21.0    
27649   27890   2020-01-02 10:01:48.463 21.5    
27650   27891   2020-01-02 10:01:48.563 22.0    
27711   27952   2020-01-02 10:32:19.897 21.5    
27712   27953   2020-01-02 10:32:19.997 21.0
27861   28102   2020-01-02 11:34:41.940 21.5    
...

Update2:

Tried:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Date'] = [datetime.datetime.date(d) for d in df['timestamp']] 
df['Date'] = pd.to_datetime(df['Date'])   
df = df[df['Date'] == '2020-01-02']

# one does not need duplicate temperature values, 
# because the task is to find changing values
df2 = df.loc[df['temperature'].shift() != df['temperature']]

# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)

# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts

# preparing plot
fig,ax = plt.subplots(figsize=(20,20))
ax.xaxis_date()
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))

# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()

# 'events'
ax2.plot(df2['events'],'-xg')

## uncomment next two lines for plotting of signum
# ax3=ax.twinx()
# ax3.plot(df2['sig'],'-m')

# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
          mdates.date2num(pd.Timestamp(maxLim)))
plt.show()

and incurred a blank graph with messages:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:31: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:38: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Update3:

Writing a for-loop to generate a graph for each day:

df['timestamp'] = pd.to_datetime(df['timestamp'])   
df['Date'] = df['timestamp'].dt.date     
df.set_index(df['timestamp'], inplace=True)

start_date = pd.to_datetime('2020-01-01 00:00:00')
end_date = pd.to_datetime('2020-02-01 00:00:00')
df = df.loc[(df.index >= start_date) & (df.index <= end_date)]

for date in df['Date'].unique():   
  df_date = df[df['Date'] == date]

# one does not need duplicate temperature values, 
# because the task is to find changing values
  df2 = pd.DataFrame.copy(df_date.loc[df_date['temperature'].shift() != df_date['temperature']])

# ye good olde forward difference
  der = np.sign(np.diff(df2['temperature']))
# to have the same length as index
  der = np.insert(der,len(der),np.NaN)
# make it column
  df2['sig'] = der

# temporary array
  evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
  evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
  df2['events'] = evts

# preparing plot
  fig,ax = plt.subplots(figsize=(30,10))

  ax.xaxis_date()
# df2['timestamp'] = pd.to_datetime(df2['timestamp'])
  ax.xaxis.set_major_locator(plticker.MaxNLocator(20)) 

# temperature itself
  ax.plot(df2['temperature'],'-xk')
  ax2=ax.twinx()

# 'events'
  g= ax2.plot(df2['events'],'-xg')

# x-axis tweaking
  ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
  minLim = '2020-01-02 00:07:00'
  maxLim = '2020-01-02 23:59:00'
  plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
          mdates.date2num(pd.Timestamp(maxLim)))

  ax.autoscale()     
  plt.title(date)
  print(np.count_nonzero(df2['events'][minLim:maxLim]))
  plt.show(g)

The graph worked but not the number of counts.


Update4:

enter image description here

enter image description here It looks like some graphs(eg. 2020-01-01, 2020-01-04, 2020-01-05) are over a random fragment of time(probably on the weekends). Is there a way to delete these days?

like image 760
nilsinelabore Avatar asked Nov 27 '25 23:11

nilsinelabore


1 Answers

First of all, I'd advice you to increase number of points, I mean in the experimental setup itself.
Nevertheless, it looks like one can extract 'events' from the data provided. The idea is simple: we need to find 'peaks', characterized with raise-the-decline pattern. To find raise and declines, it is naturally to use first order derivative, and since we are interested only in sign (plus for increasing function, minus for decreasing), I simply used signum over first order forward difference. Since we assuming that there is no spontaneously occuring peaks, we need to find points of forward difference where sign changes. It is, in fact, a surrogate second order derivative, and, actually, I achieved almost the same result using simple 2nd-order forward difference, however, not that handy.


I used next routine

# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as plticker
# endimports

# path to csv
path = r'JanuaryData.csv'
# reading the csv
df = pd.read_csv(path,usecols=['timestamp','temperature'],parse_dates=True, index_col='timestamp')

# selecting the part for the analysis
startDate = '2020-01-01 00:00:00'
endDate = '2020-01-03 23:59:00'
df = df.loc[startDate:endDate]

# one does not need duplicate temperature values, 
# because the task is to find changing values
df2 = df.loc[df['temperature'].shift() != df['temperature']]

# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)

# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])*(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts

# preparing plot
fig,ax = plt.subplots(figsize=(20,20))
ax.xaxis_date()
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))

# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()

# 'events'
ax2.plot(df2['events'],'-xg')

## uncomment next two lines for plotting of signum
# ax3=ax.twinx()
# ax3.plot(df2['sig'],'-m')

# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
          mdates.date2num(pd.Timestamp(maxLim)))
plt.show()

The image produced by the code: The image produced by the code Green curve peaks shows the beginning of the corresponding peak of the temperature and I'm sorry for not-so-visual representation. I tried to analyze the other data in the .csv, and it looks like the algorithm works well.


EDIT #1 replace line

df2 = df.loc[df['temperature'].shift() != df['temperature']]

with

df2 = pd.DataFrame.copy(df.loc[df['temperature'].shift() != df['temperature']])

to get rid of SettingWithCopyWarning.

and also rewrite lines with forward difference from

# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)

to

# ye good olde forward difference
der = np.sign(np.diff(df2['temperature']))
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = der

to prevent np.sign() warning about NaN value.


EDIT #2 to print number of events in range use

print(np.count_nonzero(df2['events'][minLim:maxLim]))

for limits used above it prints 6, for the entire dataset it gives 174.

like image 164
Suthiro Avatar answered Nov 29 '25 14:11

Suthiro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!