Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iOS10 Speech Recognition "Listening" sound effect

I am doing live speech recognition with the new iOS10 framework. I use AVCaptureSession to get to audio.

I have a "listening" beep sound to notify the user he can begin talking. The best way to put that sound is at the 1st call to captureOutput(:didOutputSampleBuffer..), but if I try to play a sound after starting the session the sound just won't play. And no error is thrown.. it just silently fail to play...

What I tried:

  • Playing through a system sound (AudioServicesPlaySystemSound...())
  • Play an asset with AVPlayer
  • Also tried both above solutions async/sync on main queue

It seems like regardless of what I am doing, it is impossible to trigger playing any kind of audio after triggering the recognition (not sure if it's specifically the AVCaptureSession or the SFSpeechAudioBufferRecognitionRequest / SFSpeechRecognitionTask...)

Any ideas? Apple even recommends playing a "listening" sound effect (and do it themselves with Siri) but I couldn't find any reference/example showing how to actually do it... (their "SpeakToMe" example doesn't play sound)

  • I can play the sound before triggering the session, and it does work (when starting the session at the completion of playing the sound) but sometimes theres a lag in actually staring the recognition (mostly when using BT headphones and switching from a different AudioSession category - for which I do not have a completion event...) - because of that I need a way to play the sound when the recording actually starts, and not before it triggers and cross fingers it won't lag starting it...
like image 801
Aviel Gross Avatar asked Oct 18 '22 20:10

Aviel Gross


1 Answers

Well, apparently there are a bunch of "rules" one must follow in order to successfully begin a speech recognition session and play a "listening" effect only when (after) the recognition really began.

  1. The session setup & triggering must be called on main queue. So:

    DispatchQueue.main.async {
        speechRequest = SFSpeechAudioBufferRecognitionRequest()
        task = recognizer.recognitionTask(with: speechRequest, delegate: self)
        capture = AVCaptureSession()
        //.....
        shouldHandleRecordingBegan = true
        capture?.startRunning()
    }
    
  2. The "listening" effect should be player via AVPlayer, not as a system sound.

  3. The safest place to know we are definitely recording, is in the delegate call of AVCaptureAudioDataOutputSampleBufferDelegate, when we get our first sampleBuffer callback:

    func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    
        //only once per recognition session
        if shouldHandleRecordingBegan {
            shouldHandleRecordingBegan = false
    
            player = AVPlayer(url: Bundle.main.url(forResource: "listening", withExtension: "aiff")!)
            player.play()            
    
            DispatchQueue.main.async {
                //call delegate/handler closure/post notification etc...
            }
        }
    
        // append buffer to speech recognition
        speechRequest?.appendAudioSampleBuffer(sampleBuffer)
    }
    
  4. End of recognition effect is hell of a lot easier:

    var ended = false
    
    if task?.state == .running || task?.state == .starting {
        task?.finish() // or task?.cancel() to cancel and not get results.
        ended = true
    }
    
    if true == capture?.isRunning {
        capture?.stopRunning()
    }
    
    if ended {
        player = AVPlayer(url: Bundle.main.url(forResource: "done", withExtension: "aiff")!)
        player.play()
    }
    
like image 61
Aviel Gross Avatar answered Oct 21 '22 05:10

Aviel Gross