Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

javascript audio API -- analyze audio file to detect exact sounds, for lip syncing

I've seen things like waveform.js which uses the Web Audio API to display waveform data, and there are many other tools out there which are able to analyze the exact sound points of an audio file in JavaScript.

If so, it should be possible to use this power of analyzation to use for real-time lip syncing using JavaScript, i.e., to get an animated character to speak at the same time the user is speaking, by simply using an audio context, and reading the data-points some how to find the right sounds.

So the question becomes, more specifically:

How exactly do I analyze audio data to extract what exact sounds are made at specific timestamps?

I want to get the end result of something like Rhubarb Lip Sync, except with JavaScript, and in real time. It doesn't have to be exact, but as close as possible.

like image 646
B''H Bi'ezras -- Boruch Hashem Avatar asked Mar 05 '20 21:03

B''H Bi'ezras -- Boruch Hashem


1 Answers

There is no algorithm that allows you to detect phonemes correctly 100% of the time.

You didn't say whether this was for real-time use or for offline use, but that would strongly affect which algorithm you'd use.

An algorithm based on mel frequency cepstral coefficients would be expected to give you about 80% accuracy, which would be good enough for video games or the like.

Deep learning systems based on covolutional neural nets would give you excellent recognition, but they are not real time systems (yet).

You could maybe start with Meyda, for example, and compare the audio features of the signal you're listening to, with a human-cataloged library of audio features for each phoneme.

like image 144
johnwbyrd Avatar answered Sep 30 '22 14:09

johnwbyrd