I am working on a set of data <code>(x={time},y={measure})</code> that comes out from an instrument, but sometimes the source cause a spike on data, which cause an incorrect plot and can cause mistakes in calculating features like max and min. So I need to remove these spikes from my data, for examples the spikes surrounded by the red circle in the image: <img src="https://i.stack.imgur.com/d8Xu2.png" alt="image link"> I have found this example for de-spiking but I don't know how to invert the signal (and if it's correct on a non-symmetric signal) and I think it's just for detecting the spikes and I need to remove them with operations like fitting etc... I need help to know if there are better ways to accomplish my task or if i have simply to adapt the example above to my situation (in that case I'll need help because I have no idea how to do it).

Here is a set of steps you can follow to estimate the location of peaks: <ol> <li> Smooth the data. Any number of filters are available for this. An excellent starting point is the <code>smooth</code> function described in the scipy cookbook. It will be up to you to select the appropriate parameters like window size: <pre class="prettyprint"><code>baseline = smooth(data, ...) </code></pre> </li> <li> Treat the smoothed data as a baseline, sort of like a best fit line in the absence of a known fitting function. Subtract the baseline from the data: <pre class="prettyprint"><code>noise = data - baseline </code></pre> </li> <li> The result is essentially a rough estimate of the noise about your pseudo-fit. Set a threshold and chop of the parts where the noise is too much: <pre class="prettyprint"><code>threshold = 3.0 * np.std(noise) mask = np.abs(noise) > threshold </code></pre> </li> </ol> There are plenty of configuration options to play with here: smoothing filter type and window size, threshold factor and even metric. E.g., you can use IQR or something entirely different instead of standard deviation. What you do with the masked points is also entirely up to you. Common options are to discard entirely or to replace with the baseline values.

De-spiking a non-periodic signal

Tags:

python

plot

signal-processing

I am working on a set of data (x={time},y={measure}) that comes out from an instrument, but sometimes the source cause a spike on data, which cause an incorrect plot and can cause mistakes in calculating features like max and min.

So I need to remove these spikes from my data, for examples the spikes surrounded by the red circle in the image:

image link

I have found this example for de-spiking but I don't know how to invert the signal (and if it's correct on a non-symmetric signal) and I think it's just for detecting the spikes and I need to remove them with operations like fitting etc...

I need help to know if there are better ways to accomplish my task or if i have simply to adapt the example above to my situation (in that case I'll need help because I have no idea how to do it).

210

asked Feb 12 '18 17:02

francesco

1 Answers

Here is a set of steps you can follow to estimate the location of peaks:

Smooth the data. Any number of filters are available for this. An excellent starting point is the smooth function described in the scipy cookbook. It will be up to you to select the appropriate parameters like window size:
```
baseline = smooth(data, ...)
```
Treat the smoothed data as a baseline, sort of like a best fit line in the absence of a known fitting function. Subtract the baseline from the data:
```
noise = data - baseline
```
The result is essentially a rough estimate of the noise about your pseudo-fit. Set a threshold and chop of the parts where the noise is too much:
```
threshold = 3.0 * np.std(noise)
mask = np.abs(noise) > threshold
```

There are plenty of configuration options to play with here: smoothing filter type and window size, threshold factor and even metric. E.g., you can use IQR or something entirely different instead of standard deviation. What you do with the masked points is also entirely up to you. Common options are to discard entirely or to replace with the baseline values.

191

answered Sep 28 '22 03:09

Mad Physicist

Related questions
                            
                                Clear QLineEdit on click event
                            
                                Why is the endian reversed after sending over TCP
                            
                                Multiple plotly plots on 1 page without subplot
                            
                                How to visualize kmeans clustering on multidimensional data
                            
                                django-auth-ldap installation not working
                            
                                Mean Std in pandas data frame
                            
                                Checking if two arrays are broadcastable in python
                            
                                How to plot using matplotlib (python) colah's deformed grid?
                            
                                How to have predictions AND labels returned with tf.estimator (either with predict or eval method)?
                            
                                Draw line between two given points (OpenCV, Python)
                            
                                Plotting a 2D plane through a 3D surface
                            
                                how to write .npy file to s3 directly?
                            
                                Non-ASCII Python identifiers and reflectivity [duplicate]
                            
                                AUTH_USER_MODEL refers to model 'accounts.User' that has not been installed
                            
                                sklearn - how to incorporate missing data when one-hot encoding
                            
                                Django, update the object after a prefetch_related
                            
                                Fastest way to find unique combinations of list
                            
                                Time series correlation with pandas
                            
                                Python - TypeError: Can't mix strings and bytes in path components
                            
                                Tensorflow dataset data preprocessing is done once for the whole dataset or for each call to iterator.next()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With