Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Continuous speech recognition while singing?

As part of my application I'm looking to add speech recognition, but not really in the traditional sense. I have a bunch of lyrics (divided into verses) that are sung by someone, and the idea is to find what verse is currently being sung so it can be displayed on screen.

I've played around with sphinx and got some basic examples set up and working, but while there seems to be plenty of documentation around on registering spoken text where you can wait for a delay then process the result, I can't find much on the idea of recognising sentences continuously. This is of course before I get to the part where the words are being sung and not spoken!

Has anyone got any experience with this, and if so is there anywhere that would provide a good starting point? Or is what I'm trying to achieve way too ambitious with sphinx and is it never really going to work properly? I'm open to looking at other libraries but they must be free, and sphinx was the most widely talked about one I could dig up.

like image 973
Michael Berry Avatar asked Aug 23 '11 13:08

Michael Berry


People also ask

How do I stop mimicking when singing?

One is to spend more time on vocalises and exercises - not only does this take you away from specific repertoire and the accompanying mimicry, but it's extremely helpful for you to just learn more about your voice and develop your technique.

What is a continuous speech recognition?

Continuous speech recognition systems allow the user to talk to the system without stops and pauses. Continuous speech recognition systems can recognize more utterances than a command-and-control system. The guidelines for continuous speech recognition differ somewhat from those for command-and-control.

What is it called when you sing sounds instead of words?

scat, also called Scat Singing, in music, jazz vocal style using emotive, onomatopoeic, and nonsense syllables instead of words in solo improvisations on a melody.

What is it called when you fluctuate your voice when singing?

What Is Vibrato? In vibrato, the voice is alternating subtly and very quickly between two pitches that are very close together. This periodic variation in the pitch (frequency) of a sustained musical note or tone should not exceed a semitone either way from the note itself.


1 Answers

It's perfectly possible to recognize speech as soon as it's pronounced with a little delay. Moreover if you more or less understand what do you expect to get. This is called "partial result" and is available in all CMUSphinx decoders through API. Basically you can retrieve hypothesis in process.

There is a little issue to consider on how to stabilize this result (how to extract the stable part of it) but this technique is called backtracking and could be easily implemented

For singing, given the music can be filtered out it's also doable.

like image 100
Nikolay Shmyrev Avatar answered Sep 21 '22 15:09

Nikolay Shmyrev