I've got a hole bunch of data (10,000 - 50,000 values for each series of measurements) and I'm interested in automatically identifying local maxima/minima out of the density estimation of the distribution of these values. In fact, I assume that usually there should be two peaks, separated by a pit, and I'd like to find that pit which separates the two peaks from each other in order to split the data into two parts for further processing. If possible, I'd like also to know where the peaks are located.
As the density estimation may contain very small local changes, I'd like to have the possibility of adjusting the "sensitivity". The best I could find so far was this solution of @Tommy : https://stackoverflow.com/a/6836924/1003358 Here is an example:
library(ggplot2)
d <- density(faithful$eruptions, bw = "sj")
loc.max <- d$x[localMaxima(d$y)]
ggplot(faithful, aes(eruptions)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
Now, my data are much noisier:
d <- density(my.df$Values, bw = "sj")
loc.max <- d$x[localMaxima(d$y)]
ggplot(my.df, aes(Values)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
Trying to adjust the parameters (note that two "unwanted" peaks in the tail have been found):
d <- density(my.df$Values, bw="nrd", adjust=1.2)
loc.max <- d$x[localMaxima(d$y)]
ggplot(my.df, aes(Values)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
So the questions are:
1) How to automatically identify real peaks within such a noisy dataset? 2) How to reliably find the pits that separate those peaks?
The local minimum is found by differentiating the function and finding the turning points at which the slope is zero. The local minimum is a point in the domain, which has the minimum value of the function.
x = k, is a point of local maxima if f'(k) = 0, and f''(k) < 0. The point at x= k is the locl maxima and f(k) is called the local maximum value of f(x).
My favorite is pastecs::turnpoints
. But you're correct that you'll have to do some subjective filtering to distinguish spiky noise from true peaks. One way to do this is to require either the raw or splined data to remain above some threshold for N consecutive values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With