Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

realtime statistical analysis

Tags:

python

numpy

I need to do some real-time data analysis to monitor for operational errors. More specifically, I'm controlling a winch on a buoy which is lowering an instrument package down through the water. I need to detect if it has hit the bottom, and stop it if it has. I've got the following data: depth of sensor, rate at which winch is unspooling. I get updates at 1Hz and the entire process lasts about 5 minutes. If the sensor hits the bottom, the depth value will usually slow dramatically and eventually stop It can be assumed that under ideal circumstances the rate of descent is linear, but due to waves, there can be a fair amount of noise.

I came up with this method:

'''
The variables sensor_depth, winch_velocity and sample_time are assumed to be updated in the background
by another thread.
'''
import numpy as np
from time import sleep
x_data = []
y_data = []
running_size = 10
while winch_is_running():
    if new_sample():
        x_data.append(sample_time)
        y_data.append(sensor_depth)
        # Get the slope for the entire procedure
        A = np.vstack([x_data,np.ones(len(x_data))])
        overall_slope,offset = np.linalg.lstsq(A,y_data)[0]
        # Get the slope for a recent set of samples
        A = np.vstack([x_data[-1*running_size],np.ones(running_size)])
        recent_slope,offset = np.linalg.lstsq(A,y_data[-1*running_size])[0]
        if overall_slope - recent_slope > allowed_slope_error:
            stop_winch()
    else: time.sleep(.2)

Does this make sense, or is there a better way?

Here's some sample data from current system. It wasn't a particularly rough day, and there was no bottom strike. The current system uses a Motorola 68k based TattleTale controller runing their version of basic. The bottom strike algorithm just compares every x samples, and if the difference isn't big enough, it stops. While this works, it is prone to false positives when it is rough, and has poor response in calm conditions:

                      Temp   Cond   Sal     DO     DEPTH    Turb Chlor 
    11/11/10 15:00:19 14.24  18.44  10.97   2.53   0.092     0.5  13.5
    11/11/10 15:00:20 14.24  18.44  10.97   2.53   0.126     0.7   9.5
    11/11/10 15:00:21 14.24  18.45  10.97   2.53   0.132     0.6  13.0
    11/11/10 15:00:22 14.24  18.44  10.96   2.53   0.152     0.6   8.6
    11/11/10 15:00:23 14.24  18.44  10.96   2.53   0.139     0.7  13.6
    11/11/10 15:00:24 14.24  18.44  10.97   2.52   0.120     0.7  13.5
    11/11/10 15:00:25 14.24  18.44  10.97   2.52   0.128     1.4   7.1
    11/11/10 15:00:26 14.24  18.44  10.96   2.52   0.128     0.6   7.9
    11/11/10 15:00:27 14.24  18.44  10.97   2.52   0.141     0.9  12.4
    11/11/10 15:00:28 14.24  18.44  10.97   2.51   0.135     1.3  12.7
    11/11/10 15:00:29 14.24  18.44  10.96   2.51   0.145     1.3  12.8
    11/11/10 15:00:30 14.24  18.44  10.96   2.51   0.163     0.6   4.8
    11/11/10 15:00:31 14.24  18.44  10.96   2.51   0.213     0.9   3.9
    11/11/10 15:00:32 14.24  18.44  10.97   2.51   0.211     0.6   7.1
    11/11/10 15:00:33 14.24  18.44  10.96   2.51   0.241     0.7   6.9
    11/11/10 15:00:34 14.24  18.44  10.96   2.51   0.286     0.5   9.8
    11/11/10 15:00:35 14.24  18.44  10.96   2.51   0.326     0.6   9.0
    11/11/10 15:00:36 14.24  18.44  10.96   2.51   0.358     0.7   3.3
    11/11/10 15:00:37 14.24  18.44  10.96   2.51   0.425     0.7  13.1
    11/11/10 15:00:38 14.24  18.43  10.96   2.51   0.419     0.8   5.3
    11/11/10 15:00:39 14.24  18.44  10.96   2.51   0.495     1.2   7.4
    11/11/10 15:00:40 14.24  18.44  10.96   2.50   0.504     0.7  16.1
    11/11/10 15:00:41 14.24  18.44  10.96   2.50   0.558     0.5  11.9
    11/11/10 15:00:42 14.24  18.44  10.96   2.50   0.585     0.8   8.8
    11/11/10 15:00:43 14.24  18.44  10.96   2.50   0.645     0.8   9.7
    11/11/10 15:00:44 14.24  18.44  10.96   2.50   0.654     0.6   5.2
    11/11/10 15:00:45 14.24  18.44  10.96   2.50   0.694     0.5   9.5
    11/11/10 15:00:46 14.24  18.44  10.96   2.50   0.719     0.7   5.9
    11/11/10 15:00:47 14.24  18.44  10.96   2.50   0.762     0.9   7.2
    11/11/10 15:00:48 14.24  18.44  10.96   2.50   0.815     1.0  11.1
    11/11/10 15:00:49 14.24  18.44  10.96   2.50   0.807     0.6   8.7
    11/11/10 15:00:50 14.24  18.44  10.96   2.50   0.884     0.4   0.4
    11/11/10 15:00:51 14.24  18.44  10.96   2.50   0.865     0.7  13.3
    11/11/10 15:00:52 14.25  18.45  10.97   2.49   0.917     1.2   7.3
    11/11/10 15:00:53 14.24  18.45  10.97   2.49   0.964     0.5   4.8
    11/11/10 15:00:54 14.25  18.44  10.97   2.49   0.967     0.6   9.7
    11/11/10 15:00:55 14.25  18.44  10.97   2.49   1.024     0.5   8.1
    11/11/10 15:00:56 14.25  18.45  10.97   2.49   1.042     1.0  14.3
    11/11/10 15:00:57 14.25  18.45  10.97   2.49   1.074     0.7   6.0
    11/11/10 15:00:58 14.26  18.46  10.97   2.49   1.093     0.9   9.0
    11/11/10 15:00:59 14.26  18.46  10.98   2.49   1.145     0.7   9.1
    11/11/10 15:01:00 14.26  18.46  10.98   2.49   1.155     1.7   8.6
    11/11/10 15:01:01 14.25  18.47  10.98   2.49   1.205     0.7   8.8
    11/11/10 15:01:02 14.25  18.48  10.99   2.49   1.237     0.8  12.9
    11/11/10 15:01:03 14.26  18.48  10.99   2.49   1.248     0.7   7.2
    11/11/10 15:01:04 14.27  18.50  11.00   2.48   1.305     1.2   9.8
    11/11/10 15:01:05 14.28  18.50  11.00   2.48   1.328     0.7  10.6
    11/11/10 15:01:06 14.29  18.49  11.00   2.48   1.367     0.6   5.4
    11/11/10 15:01:07 14.29  18.51  11.01   2.48   1.387     0.8   9.2
    11/11/10 15:01:08 14.30  18.51  11.01   2.48   1.425     0.6  14.1
    11/11/10 15:01:09 14.31  18.52  11.01   2.48   1.456     4.0  11.3
    11/11/10 15:01:10 14.31  18.52  11.01   2.47   1.485     2.5   5.3
    11/11/10 15:01:11 14.31  18.51  11.01   2.47   1.490     0.7   5.2
    11/11/10 15:01:12 14.32  18.52  11.01   2.47   1.576     0.6   6.6
    11/11/10 15:01:13 14.32  18.51  11.01   2.47   1.551     0.7   7.7
    11/11/10 15:01:14 14.31  18.49  10.99   2.47   1.627     0.6   7.3
    11/11/10 15:01:15 14.29  18.47  10.98   2.47   1.620     0.7  11.5
    11/11/10 15:01:16 14.28  18.48  10.99   2.48   1.659     0.8   7.0
    11/11/10 15:01:17 14.27  18.49  10.99   2.48   1.682     1.4  14.4
    11/11/10 15:01:18 14.26  18.49  11.00   2.48   1.724     1.0   2.9
    11/11/10 15:01:19 14.27  18.52  11.01   2.48   1.756     0.8  13.5
    11/11/10 15:01:20 14.28  18.52  11.01   2.47   1.752     5.3  11.7
    11/11/10 15:01:21 14.29  18.52  11.02   2.47   1.841     0.8   5.8
    11/11/10 15:01:22 14.30  18.52  11.01   2.47   1.789     1.0   5.5
    11/11/10 15:01:23 14.31  18.52  11.01   2.47   1.868     0.7   6.8
    11/11/10 15:01:24 14.31  18.52  11.02   2.47   1.848     0.8   7.8
    11/11/10 15:01:25 14.32  18.52  11.01   2.47   1.896     0.3   8.3
    11/11/10 15:01:26 14.32  18.52  11.01   2.47   1.923     0.9   4.8
    11/11/10 15:01:27 14.32  18.51  11.01   2.47   1.936     0.5   6.4
    11/11/10 15:01:28 14.32  18.52  11.01   2.46   1.960     0.9  10.0
    11/11/10 15:01:29 14.31  18.52  11.01   2.46   1.996     0.6  10.7
    11/11/10 15:01:30 14.31  18.52  11.01   2.47   2.024     1.7  11.8
    11/11/10 15:01:31 14.31  18.52  11.01   2.47   2.031     1.0  11.7
    11/11/10 15:01:32 14.31  18.53  11.02   2.46   2.110     1.3   5.4
    11/11/10 15:01:33 14.32  18.52  11.01   2.46   2.067     0.6  12.2
    11/11/10 15:01:34 14.32  18.52  11.01   2.46   2.144     0.4   6.4
    11/11/10 15:01:35 14.32  18.51  11.01   2.46   2.148     1.0   4.6
    11/11/10 15:01:36 14.33  18.51  11.01   2.46   2.172     0.9   9.6
    11/11/10 15:01:37 14.33  18.52  11.01   2.46   2.221     1.0   6.5
    11/11/10 15:01:38 14.33  18.51  11.01   2.46   2.219     0.3   7.6
    11/11/10 15:01:39 14.33  18.51  11.01   2.46   2.278     1.2   8.1
    11/11/10 15:01:40 14.32  18.51  11.01   2.46   2.258     0.5   0.6
    11/11/10 15:01:41 14.32  18.52  11.01   2.46   2.329     1.2   8.2
    11/11/10 15:01:42 14.31  18.51  11.01   2.46   2.321     1.1   9.6
    11/11/10 15:01:43 14.31  18.51  11.01   2.46   2.382     1.0   5.3
    11/11/10 15:01:44 14.31  18.51  11.01   2.46   2.357     0.7   8.5
    11/11/10 15:01:45 14.31  18.52  11.01   2.46   2.449     0.4  10.3
    11/11/10 15:01:46 14.31  18.52  11.01   2.46   2.430     0.6  10.0
    11/11/10 15:01:47 14.31  18.52  11.01   2.46   2.472     0.6  11.3
    11/11/10 15:01:48 14.31  18.52  11.01   2.45   2.510     1.2   8.5
    11/11/10 15:01:49 14.31  18.51  11.01   2.45   2.516     0.7   9.5
    11/11/10 15:01:50 14.31  18.52  11.01   2.45   2.529     0.5   9.6
    11/11/10 15:01:51 14.31  18.52  11.01   2.45   2.575     0.7   8.2
    11/11/10 15:01:52 14.31  18.51  11.01   2.46   2.578     0.5   9.4
    11/11/10 15:01:53 14.31  18.51  11.01   2.46   2.592     0.8   5.5
    11/11/10 15:01:54 14.30  18.51  11.01   2.46   2.666     0.6   7.1
    11/11/10 15:01:55 14.30  18.51  11.01   2.46   2.603     0.7  11.5
    11/11/10 15:01:56 14.29  18.52  11.01   2.45   2.707     0.9   7.2
    11/11/10 15:01:57 14.29  18.52  11.01   2.45   2.673     0.7   9.2
    11/11/10 15:01:58 14.28  18.52  11.01   2.45   2.705     0.7   6.4
    11/11/10 15:01:59 14.28  18.52  11.01   2.45   2.720     1.3   6.8
    11/11/10 15:02:00 14.28  18.52  11.02   2.45   2.778     0.7   7.5
    11/11/10 15:02:01 14.27  18.52  11.02   2.45   2.724     0.5   8.0
    11/11/10 15:02:02 14.27  18.51  11.01   2.45   2.840     0.9  10.0
    11/11/10 15:02:03 14.26  18.52  11.02   2.45   2.758     0.8   6.4
    11/11/10 15:02:04 14.26  18.52  11.01   2.46   2.874     0.4   9.7
    11/11/10 15:02:05 14.24  18.53  11.02   2.46   2.824     1.1  10.8
    11/11/10 15:02:06 14.24  18.53  11.02   2.46   2.896     1.0   8.8
    11/11/10 15:02:07 14.22  18.53  11.02   2.47   2.903     0.6  16.3
    11/11/10 15:02:08 14.22  18.54  11.03   2.45   2.912     0.9   9.6
    11/11/10 15:02:09 14.21  18.54  11.02   2.45   2.949     0.8   6.6
    11/11/10 15:02:10 14.20  18.54  11.03   2.45   2.964     1.4   8.4
    11/11/10 15:02:11 14.19  18.55  11.03   2.46   2.966     3.0  12.9
    11/11/10 15:02:12 14.17  18.55  11.03   2.45   3.020     1.0   7.5
    11/11/10 15:02:13 14.15  18.56  11.04   2.45   3.000     1.1   9.5
    11/11/10 15:02:14 14.14  18.56  11.04   2.45   3.064     0.9   6.5
    11/11/10 15:02:15 14.13  18.56  11.04   2.45   3.037     1.3   8.2
    11/11/10 15:02:16 14.13  18.57  11.04   2.45   3.097     1.3   7.7
    11/11/10 15:02:17 14.12  18.57  11.05   2.45   3.128     1.5   8.4
    11/11/10 15:02:18 14.11  18.58  11.05   2.45   3.104     1.7   7.0
    11/11/10 15:02:19 14.10  18.58  11.05   2.45   3.190     1.2  10.2
    11/11/10 15:02:20 14.10  18.58  11.05   2.44   3.141     5.8   9.9
    11/11/10 15:02:21 14.09  18.60  11.06   2.44   3.199     1.4   4.7
    11/11/10 15:02:22 14.07  18.60  11.07   2.44   3.208     1.6   9.4
    11/11/10 15:02:23 14.06  18.60  11.07   2.44   3.199     2.1   6.2
    11/11/10 15:02:24 14.06  18.62  11.08   2.43   3.259     3.0   9.3
    11/11/10 15:02:25 14.05  18.63  11.08   2.43   3.228     1.6   8.9
    11/11/10 15:02:26 14.06  18.63  11.08   2.43   3.289     1.6   3.5
    11/11/10 15:02:27 14.05  18.64  11.09   2.43   3.278     1.8   2.2
    11/11/10 15:02:28 14.05  18.64  11.09   2.43   3.307     2.2   9.7
    11/11/10 15:02:29 14.04  18.64  11.09   2.43   3.315     2.3   5.5
    11/11/10 15:02:30 14.04  18.65  11.10   2.43   3.367     2.1   5.1
    11/11/10 15:02:31 14.03  18.65  11.10   2.43   3.297     2.5   8.5
    11/11/10 15:02:32 14.03  18.65  11.10   2.41   3.419     1.9   6.8
    11/11/10 15:02:33 14.03  18.65  11.10   2.41   3.347     2.1   4.0
    11/11/10 15:02:34 14.03  18.66  11.10   2.41   3.405     2.0  11.8
    11/11/10 15:02:35 14.03  18.67  11.11   2.41   3.420     2.4  10.6
    11/11/10 15:02:36 14.03  18.67  11.11   2.39   3.369     2.7  10.5
    11/11/10 15:02:37 14.02  18.67  11.11   2.39   3.402     1.6   9.1
    11/11/10 15:02:38 14.02  18.66  11.11   2.39   3.408     1.9   8.5
    11/11/10 15:02:39 14.02  18.67  11.11   2.39   3.362     4.2   7.0
    11/11/10 15:02:40 14.02  18.67  11.11   2.38   3.421     2.3  12.1
    11/11/10 15:02:41 14.02  18.67  11.11   2.38   3.371     2.6  14.7
    11/11/10 15:02:42 14.02  18.67  11.11   2.38   3.409     3.0   6.5
    11/11/10 15:02:43 14.02  18.67  11.11   2.38   3.368     2.3   2.5
    11/11/10 15:02:44 14.02  18.67  11.11   2.37   3.434     2.5  10.2
    11/11/10 15:02:45 14.02  18.67  11.11   2.37   3.346     1.6   4.5

It was not a very interesting day from a data perspective either.

like image 947
RyanN Avatar asked Jun 15 '11 18:06

RyanN


People also ask

What is real-time data analysis?

Real-time analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly. For some use cases, real time simply means the analytics is completed within a few seconds or minutes after the arrival of new data.

What is an example of real-time data?

For example, real-time data provides information like a person's heartbeat, and these immediate updates can be used to save lives and even predict ailments in advance.

Why do we use real-time analysis?

Continuous real-time analytics is more proactive and alerts users or triggers responses as events happen. Put simply, real-time analytics means that you can immediately process and query new data as it is created to inform decisions in the moment and guide your business decision making.

What are the key requirements for real-time data analysis?

The core requirements of real-time analytics is access to fresh data and fast queries. These are essentially two measures of latency, data latency and query latency.


3 Answers

Your approach (comparing current derivative to mean derivative) is good, but could be improved. Most importantly, you really need to see your data before deciding how to analyze it:

plot of data, derivatives, and power spectra

These plots show: A) Your original data. Note: the rate of descent is not constant due to the spool diameter changing, and thus a linear regression is probably not optimal. Also, there is extra data at the beginning while the spool is stopped that could throw off your slope measurements. B) The derivative of your data. This is the data you are using to do your detection. Tell me: can you easily see the region where the average slope goes to zero? C) The FFT of your data, showing a lot of power in the upper-half of the frequency range--this is where your noise lies. Since the noise only occupies the upper half of the frequency range, it should be fairly easy to filter out. D) Your data after going through a gaussian lowpass filter with sigma=1.0 (scipy.ndimage.gaussian_filter(data, 1.0)) E) The derivative of D (much easier to see the bottom in this data) F) The power spectrum of E, showing noise mostly removed.

So by filtering a little, it becomes fairly easy to visually detect the bottom. The questions, then, are 1) how to translate 'visual detection' into a reliable algorithm and 2) how to determine the optimal value of sigma. If it is too small, then the noise gets in your way. If it is too large, then the spool may run too long. The only way to answer either question is empirically--pull out as many of these data sets as you can and try new ideas until you get one that works for most, if not all of your data sets.

My first approach would be something like: - Low-pass data as it arrives, using empirically pre-determined parameters. If this parameter is selected correctly, it should only be necessary to consider the very last data point that has arrived. - trigger when you find a point that is close to zero, within some (empirically determined) threshold.

There's a lot you could do to make this more 'clever', like automatically selecting a threshold based on prior noise. However, such tricks can be quite difficult to implement properly because they can be fooled by unexpected input. You are almost always better off applying what you already know about the data rather than asking the computer to guess for you.

like image 155
Luke Avatar answered Oct 03 '22 12:10

Luke


If you really need to turn it off as fast as you can, you could use a machine learning approach, where, from many samples of your runs, you could learn what the minimum allowed_slope_error value.

Or you could just try it out several times, and we which value gives you the best result in average :)

like image 35
rafalotufo Avatar answered Oct 03 '22 11:10

rafalotufo


My guess is that you don't want loads of cable being dumped on top of the device that you're lowering, and you also don't want it to land too hard. Assuming the physics governing your problem is fairly well known, then depending on the noise, you may be able to implement a simple feedback controller. If you're able to control the winch_velocity, then a simple PID controller may well do the trick.

If you're only able to switch the winch on or off, then it'll be trickier (as in "a lot more maths") to implement a controller to make a perfect landing (you'd have to switch the winch on and off to control the rate of decent).

My personal experience was that Control Theory (basics) were a bit easier to get the hang of than Machine Learning (basics), and so I think it's worth trying it out on your control problem.

like image 25
Rob Avatar answered Oct 03 '22 13:10

Rob