Binary classification of sensor data

Question

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).

Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:

enter image description here

The problem is that I don't know which method to use. There are several approaches of which I thought of:

sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both

(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)

using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)

What would you suggest? Are there better/simpler methods to get this task done?

I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)

Thank you a lot and best regards.

halfflat · Accepted Answer

The key to evaluating your heuristic is to develop a model of the behaviour of the system.

For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?

What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?

Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.

Edit:

Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.

For example, suppose:

The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.

You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.

If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.

Edit:

The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.

stefan · Answer

from your description a correctly parameterized moving average might be sufficient

Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)

some other tips

you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

Binary classification of sensor data

Tags:

algorithm

machine-learning

classification

svm

signal-processing

CShor

2 Answers

halfflat

stefan

Recent Activity

Donate For Us

Binary classification of sensor data

Tags:

algorithm

machine-learning

classification

svm

signal-processing

CShor

2 Answers

halfflat

stefan

Related questions

Recent Activity

Donate For Us