The issue at hand looks easy, but I could not find an easy solution so far.
I've got a histogram describing the value distributing of an array of floats, roughly looking like this:
As you can see, there is a local maximum near 0, which keeps falling down to a local minimum, then rising quickly to a plateau, and in the end falling to 0. I would like to detect the local minimum.
In practice, the histogram is not as smooth:
There are lots of spikes, and the local minimum may be stretched and uneven. I'm not sure how to tackle this problem.
There is little domain knowledge. The first max may even be higher than the second max. There may be spikes in any direction, values may be as low as 0.
This is a real life sample taken from 8 distinct runs. It's scaled to 0 - 10 to make it easier to understand.
0: 22% 12% 19% 17% 6% 5% 6% 5%
1: 3% 2% 1% 1% 4% 1% 4% 1%
2: 6% 2% 13% 5% 0% 2% 0% 2%
3: 62% 62% 52% 42% 2% 5% 2% 5%
4: 4% 19% 12% 28% 10% 13% 10% 13%
5: 0% 0% 3% 29% 30% 29% 30%
6: 37% 31% 37% 30%
7: 1% 7% 1% 7%
8: 6% 1% 6% 1%
9:
10:
Values rounded down. Missing values denote no occurrence of any value.
Explanation of the first line:
0: 22% the initial max
1: 3% local min
2: 6% still min
3: 62% plateau max
4: 4% second min
5: 0% 0
6: no more values
7:
8:
9:
10:
For reference, a list of the same data, this time scaled to 0 - 100 (there were no values in the 90-100 range at all). I messed up on the formatting, but it should give a rough idea.
0: 0% 0% 0% 1% 0% 0% 0% 0%
1: 0% 1% 1% 3% 0% 0% 0% 0%
2: 1% 2% 1% 3% 0% 0% 0% 0%
3: 4% 2% 3% 3% 0% 1% 0% 1%
4: 6% 1% 3% 2% 0% 0% 0% 0%
5: 2% 0% 3% 1% 0% 0% 0% 0%
6: 1% 0% 2% 0% 0% 0% 0% 0%
7: 1% 0% 1% 0% 0% 0% 0% 0%
8: 1% 0% 1% 0% 0% 0% 0% 0%
9: 1% 0% 1% 0% 1% 0% 1% 0%
10: 1% 0% 0% 0% 1% 0% 1% 0%
11: 0% 0% 0% 0% 0% 0% 0% 0%
12: 0% 0% 0% 0% 0% 0% 0% 0%
13: 0% 0% 0% 0% 0% 0% 0% 0%
14: 0% 0% 0% 0% 0% 0% 0% 0%
15: 0% 0% 0% 0% 0% 0% 0% 0%
16: 0% 0% 0% 0% 0% 0% 0% 0%
17: 0% 0% 0% 0% 0% 0% 0% 0%
18: 0% 0% 0% 0% 0% 0% 0% 0%
19: 0% 0% 0% 0% 0% 0% 0% 0%
20: 0% 0% 0% 0% 0% 0% 0% 0%
21: 0% 0% 0% 0% 0% 0% 0% 0%
22: 0% 0% 0% 0% 0% 0% 0% 0%
23: 0% 0% 0% 0% 0% 0% 0% 0%
24: 0% 0% 1% 0% 0% 0% 0% 0%
25: 0% 0% 1% 0% 0% 0% 0% 0%
26: 0% 0% 1% 0% 0% 0% 0% 0%
27: 0% 0% 1% 0% 0% 0% 0% 0%
28: 1% 0% 2% 1% 0% 0% 0% 0%
29: 3% 0% 2% 2% 0% 0% 0% 0%
30: 7% 1% 3% 2% 0% 0% 0% 0%
31: 10% 2% 4% 3% 0% 0% 0% 0%
32: 10% 3% 4% 4% 0% 0% 0% 0%
33: 6% 6% 5% 5% 0% 0% 0% 0%
34: 5% 5% 4% 4% 0% 0% 0% 0%
35: 5% 8% 6% 3% 0% 0% 0% 0%
36: 5% 10% 6% 4% 0% 0% 0% 0%
37: 5% 9% 5% 3% 0% 0% 0% 0%
38: 3% 8% 5% 5% 0% 0% 0% 0%
39: 2% 5% 5% 5% 0% 0% 0% 0%
40: 1% 4% 4% 5% 0% 1% 0% 1%
41: 1% 3% 2% 5% 0% 1% 0% 1%
42: 0% 1% 1% 4% 0% 0% 0% 0%
43: 0% 2% 0% 4% 1% 1% 1% 1%
44: 0% 1% 0% 3% 1% 1% 1% 1%
45: 0% 1% 0% 1% 0% 1% 0% 1%
46: 0% 1% 0% 1% 1% 1% 1% 1%
47: 0% 1% 0% 0% 1% 1% 1% 1%
48: 0% 1% 0% 0% 1% 1% 1% 1%
50: 0% 0% 0% 1% 1% 1% 1% 1%
50: 0% 1% 1% 1% 1% 1%
51: 0% 0% 2% 1% 2% 1%
52: 0% 1% 2% 1% 2% 1%
53: 0% 0% 4% 2% 4% 2%
54: 0% 2% 2% 2% 2%
55: 0% 2% 2% 2% 2%
56: 0% 2% 3% 2% 3%
57: 0% 2% 4% 2% 4%
58: 4% 6% 4% 6%
59: 3% 3% 3% 3%
60: 5% 5% 5% 5%
61: 5% 7% 5% 7%
62: 3% 5% 3% 5%
63: 4% 3% 4% 3%
64: 5% 2% 5% 2%
65: 3% 2% 2% 2%
66: 5% 1% 5% 1%
67: 1% 0% 1% 0%
68: 1% 0% 1% 0%
69: 0% 1% 0% 1%
70: 0% 0% 0% 0%
71: 0% 0% 0% 0%
72: 0% 0% 0% 0%
73: 0% 1% 0% 1%
74: 0% 0% 0% 0%
75: 0% 0% 0% 0%
76: 0% 1% 0% 1%
77: 0% 0% 0% 0%
78: 0% 0% 0% 0%
79: 0% 0% 0% 0%
80: 0% 0% 0% 1%
81: 0% 0% 0% 0%
82: 0% 0% 0% 0%
83: 0% 0% 0% 0%
84: 0% 0% 0% 0%
85: 1% 1%
86: 0% 0%
87: 1% 1%
88: 1% 1%
89: 0% 0%
Substitute the value of x in the function and find the value where the function has either minimum values or maximum values. In order to find whether the point is local/global minima or maxima, take the second-order derivative and determine whether the value is positive or negative.
Your "true" histogram is low frequency. Your noise is high frequency. Low-pass filtering the data with an appropriate bandwidth filter will get rid of most of the noise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With