The graph shows water temperature against time. When there is an activation, temperature will increase. When the activation ends, temperature will start decreasing(although sometimes there may be a time lag).

I would like to calculate the number of times where there is an event(each blue circle represents one activation). There are times of random noise(red circles - indicates random temperature change, but you can see there is only increases or decrease but not both, implying that it is not a proper event).
Temperature records update for every 0.5°C change in temperature, regardless of time.
I have tried using 1) temperature difference, and 2) temperature change gradient of adjacent data points to identify the event start timestamps and end timestamps, and counting it as one event. But this is not very accurate.
I am told that I should use only the temperature difference and identify the pattern of (increase - max temp - decrease) as one event. Any ideas what is an appropriate way to calculate the total number of activations?
Update1:
Sample Data:
id timestamp temperature
27581 27822 2020-01-02 07:53:05.173 19.5
27582 27823 2020-01-02 07:53:05.273 20.0
27647 27888 2020-01-02 10:01:46.380 20.5
27648 27889 2020-01-02 10:01:46.480 21.0
27649 27890 2020-01-02 10:01:48.463 21.5
27650 27891 2020-01-02 10:01:48.563 22.0
27711 27952 2020-01-02 10:32:19.897 21.5
27712 27953 2020-01-02 10:32:19.997 21.0
27861 28102 2020-01-02 11:34:41.940 21.5
...
Update2:
Tried:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Date'] = [datetime.datetime.date(d) for d in df['timestamp']]
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'] == '2020-01-02']
# one does not need duplicate temperature values,
# because the task is to find changing values
df2 = df.loc[df['temperature'].shift() != df['temperature']]
# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)
# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts
# preparing plot
fig,ax = plt.subplots(figsize=(20,20))
ax.xaxis_date()
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))
# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()
# 'events'
ax2.plot(df2['events'],'-xg')
## uncomment next two lines for plotting of signum
# ax3=ax.twinx()
# ax3.plot(df2['sig'],'-m')
# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
mdates.date2num(pd.Timestamp(maxLim)))
plt.show()
and incurred a blank graph with messages:
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:31: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:38: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Update3:
Writing a for-loop to generate a graph for each day:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Date'] = df['timestamp'].dt.date
df.set_index(df['timestamp'], inplace=True)
start_date = pd.to_datetime('2020-01-01 00:00:00')
end_date = pd.to_datetime('2020-02-01 00:00:00')
df = df.loc[(df.index >= start_date) & (df.index <= end_date)]
for date in df['Date'].unique():
df_date = df[df['Date'] == date]
# one does not need duplicate temperature values,
# because the task is to find changing values
df2 = pd.DataFrame.copy(df_date.loc[df_date['temperature'].shift() != df_date['temperature']])
# ye good olde forward difference
der = np.sign(np.diff(df2['temperature']))
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = der
# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])&(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts
# preparing plot
fig,ax = plt.subplots(figsize=(30,10))
ax.xaxis_date()
# df2['timestamp'] = pd.to_datetime(df2['timestamp'])
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))
# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()
# 'events'
g= ax2.plot(df2['events'],'-xg')
# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
mdates.date2num(pd.Timestamp(maxLim)))
ax.autoscale()
plt.title(date)
print(np.count_nonzero(df2['events'][minLim:maxLim]))
plt.show(g)
The graph worked but not the number of counts.
Update4:

It looks like some graphs(eg. 2020-01-01, 2020-01-04, 2020-01-05) are over a random fragment of time(probably on the weekends). Is there a way to delete these days?
First of all, I'd advice you to increase number of points, I mean in the experimental setup itself.
Nevertheless, it looks like one can extract 'events' from the data provided. The idea is simple: we need to find 'peaks', characterized with raise-the-decline pattern. To find raise and declines, it is naturally to use first order derivative, and since we are interested only in sign (plus for increasing function, minus for decreasing), I simply used signum over first order forward difference. Since we assuming that there is no spontaneously occuring peaks, we need to find points of forward difference where sign changes. It is, in fact, a surrogate second order derivative, and, actually, I achieved almost the same result using simple 2nd-order forward difference, however, not that handy.
I used next routine
# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as plticker
# endimports
# path to csv
path = r'JanuaryData.csv'
# reading the csv
df = pd.read_csv(path,usecols=['timestamp','temperature'],parse_dates=True, index_col='timestamp')
# selecting the part for the analysis
startDate = '2020-01-01 00:00:00'
endDate = '2020-01-03 23:59:00'
df = df.loc[startDate:endDate]
# one does not need duplicate temperature values,
# because the task is to find changing values
df2 = df.loc[df['temperature'].shift() != df['temperature']]
# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)
# temporary array
evts = np.zeros(len(der))
# we find that points, where the signum is changing from 1 to -1, i.e. crosses zero
evts[(df2['sig'].shift() != df2['sig'])*(0 > df2['sig'])] = 1.0
# make it column for plotting
df2['events'] = evts
# preparing plot
fig,ax = plt.subplots(figsize=(20,20))
ax.xaxis_date()
ax.xaxis.set_major_locator(plticker.MaxNLocator(20))
# temperature itself
ax.plot(df2['temperature'],'-xk')
ax2=ax.twinx()
# 'events'
ax2.plot(df2['events'],'-xg')
## uncomment next two lines for plotting of signum
# ax3=ax.twinx()
# ax3.plot(df2['sig'],'-m')
# x-axis tweaking
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
minLim = '2020-01-02 00:07:00'
maxLim = '2020-01-02 23:59:00'
plt.xlim(mdates.date2num(pd.Timestamp(minLim)),
mdates.date2num(pd.Timestamp(maxLim)))
plt.show()
The image produced by the code:
Green curve peaks shows the beginning of the corresponding peak of the temperature and I'm sorry for not-so-visual representation.
I tried to analyze the other data in the .csv, and it looks like the algorithm works well.
EDIT #1 replace line
df2 = df.loc[df['temperature'].shift() != df['temperature']]
with
df2 = pd.DataFrame.copy(df.loc[df['temperature'].shift() != df['temperature']])
to get rid of SettingWithCopyWarning.
and also rewrite lines with forward difference from
# ye good olde forward difference
der = np.diff(df2['temperature'])
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = np.sign(der)
to
# ye good olde forward difference
der = np.sign(np.diff(df2['temperature']))
# to have the same length as index
der = np.insert(der,len(der),np.NaN)
# make it column
df2['sig'] = der
to prevent np.sign() warning about NaN value.
EDIT #2 to print number of events in range use
print(np.count_nonzero(df2['events'][minLim:maxLim]))
for limits used above it prints 6, for the entire dataset it gives 174.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With