Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find the audio format of the selected voice of the SpeechSynthesizer

In a text to speech application by C# I use SpeechSynthesizer class, it has an event named SpeakProgress which is fired for every spoken word. But for some voices the parameter e.AudioPosition is not synchronized with the output audio stream, and the output wave file is played faster than what this position shows (see this related question).

Anyway, I am trying to find the exact information about the bit rate and other information related to the selected voice. As I have experienced if I can initialize the wave file with this information, the synchronizing problem will be resolved. However, if I can't find such information in the SupportedAudioFormat, I know no other way to find them. For example the "Microsoft David Desktop" voice provides no supported format in the VoiceInfo, but it seems it supports a PCM 16000 hz, 16 bit format.

How can I find audio format of the selected voice of the SpeechSynthesizer

 var formats = CurVoice.VoiceInfo.SupportedAudioFormats;

 if (formats.Count > 0)
 {
     var format = formats[0];
     reader.SetOutputToWaveFile(CurAudioFile, format);
 }
 else
 {
        var format = // How can I find it, if the audio hasn't provided it?           
        reader.SetOutputToWaveFile(CurAudioFile, format );
}
like image 705
Ahmad Avatar asked Dec 08 '15 07:12

Ahmad


1 Answers

Update: This answer has been edited following investigation. Initially I was suggesting from memory that SupportedAudioFormats is likely just from (possibly misconfigured) registry data; investigation has shown that for me, on Windows 7, this is definitely the case, and is backed up acecdotally on Windows 8.

Issues with SupportedAudioFormats

System.Speech wraps the venerable COM speech API (SAPI) and some voices are 32 vs 64 bit, or can be misconfigured (on a 64 bit machine's registry, HKLM/Software/Microsoft/Speech/Voices vs HKLM/Software/Wow6432Node/Microsoft/Speech/Voices.

I've pointed ILSpy at System.Speech and its VoiceInfo class, and I'm pretty convinced that SupportedAudioFormats comes solely from registry data, hence it's possible to get zero results back when enumerating SupportedAudioFormats if either your TTS engine isn't properly registered for your application's Platform target (x86, Any or 64 bit), or if the vendor simply doesn't provide this information in the registry.

Voices may still support different, additional or fewer formats, as that's up to the speech engine (code) rather than the registry (data). So it can be a shot in the dark. Standard Windows voices are often times more consistent in this regard than third party voices, but they still don't necessarily usefully provide SupportedAudioFormats.

Finding this Information the Hard Way

I've found it's still possible to get the current format of the current voice - but this does rely on reflection to access the internals of the System.Speech SAPI wrappers.

Consequently this is quite fragile code! And I wouldn't recommend use in production.

Note: The below code does require you to have called Speak() once for setup; more calls would be needed to force setup without Speak(). However, I can call Speak("") to say nothing and that works just fine.

Implementation:

[StructLayout(LayoutKind.Sequential)]
struct WAVEFORMATEX
{
    public ushort wFormatTag;
    public ushort nChannels;
    public uint nSamplesPerSec;
    public uint nAvgBytesPerSec;
    public ushort nBlockAlign;
    public ushort wBitsPerSample;
    public ushort cbSize;
}

WAVEFORMATEX GetCurrentWaveFormat(SpeechSynthesizer synthesizer)
{
    var voiceSynthesis = synthesizer.GetType()
                                    .GetProperty("VoiceSynthesizer", BindingFlags.Instance | BindingFlags.NonPublic)
                                    .GetValue(synthesizer, null);

    var ttsVoice = voiceSynthesis.GetType()
                                 .GetMethod("CurrentVoice", BindingFlags.Instance | BindingFlags.NonPublic)
                                 .Invoke(voiceSynthesis, new object[] { false });

    var waveFormat = (byte[])ttsVoice.GetType()
                                     .GetField("_waveFormat", BindingFlags.Instance | BindingFlags.NonPublic)
                                     .GetValue(ttsVoice);

    var pin = GCHandle.Alloc(waveFormat, GCHandleType.Pinned);
    var format = (WAVEFORMATEX)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(WAVEFORMATEX));
    pin.Free();

    return format;
}

Usage:

SpeechSynthesizer s = new SpeechSynthesizer();
s.Speak("Hello");
var format = GetCurrentWaveFormat(s);
Debug.WriteLine($"{s.Voice.SupportedAudioFormats.Count} formats are claimed as supported.");
Debug.WriteLine($"Actual format: {format.nChannels} channel {format.nSamplesPerSec} Hz {format.wBitsPerSample} audio");

To test it, I renamed Microsoft Anna's AudioFormats registry key under HKLM/Software/Wow6432Node/Microsoft/Speech/Voices/Tokens/MS-Anna-1033-20-Dsk/Attributes, causing SpeechSynthesizer.Voice.SupportedAudioFormats to have no elements when queried. The below is the output in this situation:

0 formats are claimed as supported.
Actual format: 1 channel 16000 Hz 16 audio
like image 192
El Zorko Avatar answered Sep 23 '22 02:09

El Zorko