In a text to speech application by C# I use SpeechSynthesizer
class, it has an event named SpeakProgress
which is fired for every spoken word. But for some voices the parameter e.AudioPosition
is not synchronized with the output audio stream, and the output wave file is played faster than what this position shows (see this related question).
Anyway, I am trying to find the exact information about the bit rate and other information related to the selected voice. As I have experienced if I can initialize the wave file with this information, the synchronizing problem will be resolved. However, if I can't find such information in the SupportedAudioFormat
, I know no other way to find them. For example the "Microsoft David Desktop" voice provides no supported format in the VoiceInfo
, but it seems it supports a PCM 16000 hz, 16 bit format.
How can I find audio format of the selected voice of the SpeechSynthesizer
var formats = CurVoice.VoiceInfo.SupportedAudioFormats;
if (formats.Count > 0)
{
var format = formats[0];
reader.SetOutputToWaveFile(CurAudioFile, format);
}
else
{
var format = // How can I find it, if the audio hasn't provided it?
reader.SetOutputToWaveFile(CurAudioFile, format );
}
Update: This answer has been edited following investigation. Initially I was suggesting from memory that SupportedAudioFormats is likely just from (possibly misconfigured) registry data; investigation has shown that for me, on Windows 7, this is definitely the case, and is backed up acecdotally on Windows 8.
System.Speech
wraps the venerable COM speech API (SAPI) and some voices are 32 vs 64 bit, or can be misconfigured (on a 64 bit machine's registry, HKLM/Software/Microsoft/Speech/Voices
vs HKLM/Software/Wow6432Node/Microsoft/Speech/Voices
.
I've pointed ILSpy at System.Speech
and its VoiceInfo
class, and I'm pretty convinced that SupportedAudioFormats comes solely from registry data, hence it's possible to get zero results back when enumerating SupportedAudioFormats
if either your TTS engine isn't properly registered for your application's Platform target (x86, Any or 64 bit), or if the vendor simply doesn't provide this information in the registry.
Voices may still support different, additional or fewer formats, as that's up to the speech engine (code) rather than the registry (data). So it can be a shot in the dark. Standard Windows voices are often times more consistent in this regard than third party voices, but they still don't necessarily usefully provide SupportedAudioFormats
.
I've found it's still possible to get the current format of the current voice - but this does rely on reflection to access the internals of the System.Speech SAPI wrappers.
Consequently this is quite fragile code! And I wouldn't recommend use in production.
Note: The below code does require you to have called Speak() once for setup; more calls would be needed to force setup without Speak(). However, I can call Speak("")
to say nothing and that works just fine.
Implementation:
[StructLayout(LayoutKind.Sequential)]
struct WAVEFORMATEX
{
public ushort wFormatTag;
public ushort nChannels;
public uint nSamplesPerSec;
public uint nAvgBytesPerSec;
public ushort nBlockAlign;
public ushort wBitsPerSample;
public ushort cbSize;
}
WAVEFORMATEX GetCurrentWaveFormat(SpeechSynthesizer synthesizer)
{
var voiceSynthesis = synthesizer.GetType()
.GetProperty("VoiceSynthesizer", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(synthesizer, null);
var ttsVoice = voiceSynthesis.GetType()
.GetMethod("CurrentVoice", BindingFlags.Instance | BindingFlags.NonPublic)
.Invoke(voiceSynthesis, new object[] { false });
var waveFormat = (byte[])ttsVoice.GetType()
.GetField("_waveFormat", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(ttsVoice);
var pin = GCHandle.Alloc(waveFormat, GCHandleType.Pinned);
var format = (WAVEFORMATEX)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(WAVEFORMATEX));
pin.Free();
return format;
}
Usage:
SpeechSynthesizer s = new SpeechSynthesizer();
s.Speak("Hello");
var format = GetCurrentWaveFormat(s);
Debug.WriteLine($"{s.Voice.SupportedAudioFormats.Count} formats are claimed as supported.");
Debug.WriteLine($"Actual format: {format.nChannels} channel {format.nSamplesPerSec} Hz {format.wBitsPerSample} audio");
To test it, I renamed Microsoft Anna's AudioFormats
registry key under HKLM/Software/Wow6432Node/Microsoft/Speech/Voices/Tokens/MS-Anna-1033-20-Dsk/Attributes
, causing SpeechSynthesizer.Voice.SupportedAudioFormats
to have no elements when queried. The below is the output in this situation:
0 formats are claimed as supported.
Actual format: 1 channel 16000 Hz 16 audio
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With