Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).

Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:

enter image description here

The problem is that I don't know which method to use. There are several approaches of which I thought of:

  • sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
  • sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
  • combination of both

(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)

  • using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)

What would you suggest? Are there better/simpler methods to get this task done?

I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)

Thank you a lot and best regards.

like image 417
CShor Avatar asked Apr 14 '26 14:04

CShor


2 Answers

The key to evaluating your heuristic is to develop a model of the behaviour of the system.

For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?

What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?

Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.

Edit:

Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.

For example, suppose:

  • The baseline, event threshold and noise magnitude are all known a priori.

  • The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.

You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.

If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.

Edit:

The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.

like image 156
halfflat Avatar answered Apr 17 '26 05:04

halfflat


from your description a correctly parameterized moving average might be sufficient

  • Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
  • Get lots of real sensor data recorded
  • visualize the data and verify your assuptions and model
  • annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
  • from what you learned till now propose one or more algorithms
  • make a test system that can verify your algorithms against ground truth and do regression against previous runs
  • implement your proposed algorithms and run them against ground truth
  • try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
  • adapt your algotithm(s)

some other tips

  • you may implement hysteresis on thresholds to avoid bouncing
  • you may implement delays to avoid bouncing
  • beware of delays if implementing debouncers or low pass filters
  • you may implement multiple algorithms and voting
  • for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease
like image 42
stefan Avatar answered Apr 17 '26 06:04

stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!