Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I control the Apple macOS Speech Synthesis audio unit with slider values?

I'm working to incorporate Apple speech synthesis audio unit stuff (works only on macOS, not iOS) into AudioKit and I've built a AKSpeechSynthesizer Class (initially created by wangchou in this pull request) and a demo project both available on the develop branch of AudioKit.

My project is very similar to this Cocoa Speech Synthesis Example but on this project, the rate variable can be changed and varied smoothly between a low number of words per minute (40) up to a high number (300 ish). However, my project starts off at the default rate of 175 and any change slows down the rate to a crawl - except if you change it up to 350, then it goes super fast.

I can't see what I am doing different from this example as both projects rely on

SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate as NSNumber?)

to set the rate.

Here's my implementation and the working one.

The biggest difference is that my synthesizer is set up as an audio unit, whereas I think the working example just uses the default output to speaker.

The other parameters of frequency(pitch) or modulation (pitchMod) are also exhibiting strange behavior, but its less noticeable on those, and those work a little funny in both projects.

Can someone tell me why mine doesn't work or fix it via a pull request? Any help would be greatly appreciated and attributed within the code.

Thanks!

like image 409
Aurelius Prochazka Avatar asked Apr 08 '18 09:04

Aurelius Prochazka


1 Answers

It seems like the rate, pitch and modulation speech properties need to be integral values, without fractional parts, for everything to work properly.

The CocoaSpeechSynthesis example actually exhibits the same behaviour, but initialises the rate field to an integral value. To reproduce the problem, try setting the rate first to 333, and then 333.3, for instance.

The other pitch and modulation parameters appear to be equally picky about fractional parts and seem to only yield reasonable results when set to integral values as well.

Unfortunately, I could not find any online reference documentation material that confirms these findings, but here is a patch that lets the 3 speech parameters behave in the SpeechSynthesizer example project:

diff --git a/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift b/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift
index 81286b8fb..324966e13 100644
--- a/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift 
+++ b/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift 
@@ -47,7 +47,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new rate")
-            let _ = SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate.rounded() as NSNumber?)
        }
    }

@@ -70,7 +70,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new freq")
-            let _ = SetSpeechProperty(speechChannel, kSpeechPitchBaseProperty, newFrequency as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechPitchBaseProperty, newFrequency.rounded() as NSNumber?)
        }
    }

@@ -93,7 +93,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new modulation")
-            let _ = SetSpeechProperty(speechChannel, kSpeechPitchModProperty, newModulation as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechPitchModProperty, newModulation.rounded() as NSNumber?)
        }
    }

It's just 3 extra calls to Swift's number rounding method.

like image 187
Nicolas Tisserand Avatar answered Sep 30 '22 18:09

Nicolas Tisserand