I am currently trying to implement basic speech recognition in AS3. I need this to be completely client side, as such I can't access powerful server-side speech recognition tools. The idea I had was to detect syllables in a word, and use that to determine the word spoken. I am aware that this will grealty limit the capacities for recognition, but I only need to recognize a few key words and I can make sure they all have a different number of syllables.
I am currently able to generate a 1D array of voice level for a spoken word, and I can clearly see, if I somehow draw it, that there are distinct peaks for the syllables in most of the cases. However, I am completely stuck as to how I would find out those peaks. I only really need the count, but I suppose that comes with finding them. At first I thought of grabbing a few maximum values and comparing them with the average of values but I had forgot about that peak that is bigger than the others and as such, all my "peaks" were located on one actual peak.
I stumbled onto some Matlab code that looks almost too short to be true, but I can't very that as I am unable to convert it to any language I know. I tried AS3 and C#. So I am wondering if you guys could start me on the right path or had any pseudo-code for peak detection?
Definition: Peak detector circuits are used to determine the peak (maximum) value of an input signal. It stores the peak value of input voltages for infinite time duration until it comes to reset condition.
A new automatic peak detection algorithm is developed and applied to histogram-based image data reduction (quantization). The algorithm uses a peak detection signal derived either from the image histogram or the cumulative distribution function to locate the peaks in the image histogram.
Find the start and the end by comparing the current value to the median, i.e. if the current value is smaller than the median but the next one is bigger, a peak starts. The opposite is valid for the end, if the current value is higher than the median but the next one is smaller, it's the end.
The matlab code is pretty straightforward. I'll try to translate it to something more pseudocodeish.
It should be easy to translate to ActionScript/C#, you should try this and post follow-up questions with your code if you get stuck, this way you'll have the best learning effect.
Param: delta (defines kind of a tolerance and depends on your data, try out different values)
min = Inf (or some very high value)
max = -Inf (or some very low value)
lookformax = 1
for every datapoint d [0..maxdata] in array arr do
this = arr[d]
if this > max
max = this
maxpos = d
endif
if this < min
min = this
minpos = d
endif
if lookformax == 1
if this < max-delta
there's a maximum at position maxpos
min = this
minpos = d
lookformax = 0
endif
else
if this > min+delta
there's a minimum at position minpos
max = this
maxpos = d
lookformax = 1
endif
endif
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With