Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Continuous listen the user voice and detect end of speech silence in SpeechKit framework

I have working an application where we need to open certain screen based on voice command like if user says "Open Setting" then it should open the setting screen, so far that I have used the SpeechKit framework but I am not able to detect the end of speech silence. Like how Siri does it. I want to detect if the user has ended his sentence/phrase.

Please find the below code for same where I have integrate the SpeechKit framework in two ways.

A) Via closure(recognitionTask(with request: SFSpeechRecognitionRequest, resultHandler: @escaping (SFSpeechRecognitionResult?, Error?) -> Swift.Void) -> SFSpeechRecognitionTask)

let audioEngine = AVAudioEngine()
let speechRecognizer = SFSpeechRecognizer()
let request = SFSpeechAudioBufferRecognitionRequest()
var recognitionTask: SFSpeechRecognitionTask?

func startRecording() throws {

        let node = audioEngine.inputNode
        let recordingFormat = node.outputFormat(forBus: 0)

        node.installTap(onBus: 0, bufferSize: 1024,
                        format: recordingFormat) { [unowned self]
                            (buffer, _) in
                            self.request.append(buffer)
        }

        audioEngine.prepare()
        try audioEngine.start()

        weak var weakSelf = self

        recognitionTask = speechRecognizer?.recognitionTask(with: request) {
            (result, error) in

            if result != nil {

                if let transcription = result?.bestTranscription {
                    weakSelf?.idenifyVoiceCommand(transcription)
                }
            }
        }            
}

But when I say any word/sentence like "Open Setting" then closure(recognitionTask(with:)) called multiple times and I have put the method(idenifyVoiceCommand) inside the closure which call multiple times, so how can I restrict to call only one time.

And I also review the Timer logic while googling it(SFSpeechRecognizer - detect end of utterance) but in my scenarion it does not work beacause I did not stop the audio engine as it continuously listening the user’s voice like Siri does.

B) Via delegate(SFSpeechRecognitionTaskDelegate)

speechRecognizer.recognitionTask(with: self.request, delegate: self)

func speechRecognitionTaskWasCancelled(_ task: SFSpeechRecognitionTask) {

}

func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) {

}

And I found that the delegate which handle when the end of speech occurs do not call it and accidentally call it after sometimes.

like image 278
Ramkrishna Sharma Avatar asked Apr 06 '18 12:04

Ramkrishna Sharma


1 Answers

I had the same issue until now.

I checked your question and I suppose the code below helps you achieve the same thing I did:

recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, 
resultHandler: { (result, error) in

    var isFinal = false

    if result != nil {

        self.inputTextView.text = result?.bestTranscription.formattedString
        isFinal = (result?.isFinal)!
    }

    if let timer = self.detectionTimer, timer.isValid {
        if isFinal {
            self.inputTextView.text = ""
            self.textViewDidChange(self.inputTextView)
            self.detectionTimer?.invalidate()
        }
    } else {
        self.detectionTimer = Timer.scheduledTimer(withTimeInterval: 1.5, repeats: false, block: { (timer) in
            self.handleSend()
            isFinal = true
            timer.invalidate()
        })
    }

})

This checks if input wasn't received for 1.5 seconds

like image 132
Muhammad Essa Avatar answered Nov 09 '22 20:11

Muhammad Essa