TTS to Stream with SpeechAudioFormatInfo using SpeechSynthesizer

Tags:

I am using System.Speech.Synthesis.SpeechSynthesizer to convert text to speech. And due to Microsoft's anemic documentation (see my link, there's no remarks or code examples) I'm having trouble making heads or tails of the difference between two methods:

SetOutputToAudioStream and SetOutputToWaveStream.

Here's what I have deduced:

SetOutputToAudioStream takes a stream and a SpeechAudioFormatInfo instance that defines the format of the wave file (samples per second, bits per second, audio channels, etc.) and writes the text to the stream.

SetOutputToWaveStream takes just a stream and writes a 16 bit, mono, 22kHz, PCM wave file to the stream. There is no way to pass in SpeechAudioFormatInfo.

My problem is SetOutputToAudioStream doesn't write a valid wave file to the stream. For example I get a InvalidOperationException ("The wave header is corrupt") when passing the stream to System.Media.SoundPlayer. If I write the stream to disk and attempt to play it with WMP I get a "Windows Media Player cannot play the file..." error but the stream written by SetOutputToWaveStream plays properly in both. My theory is that SetOutputToAudioStream is not writing a (valid) header.

Strangely the naming conventions for the SetOutputTo*Blah* is inconsistent. SetOutputToWaveFile takes a SpeechAudioFormatInfo while SetOutputToWaveStream does not.

I need to be able to write a 8kHz, 16-bit, mono wave file to a stream, something that neither SetOutputToAudioStream or SetOutputToWaveStream allow me to do. Does anybody have insight into SpeechSynthesizer and these two methods?

For reference, here's some code:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  synth.SelectVoice(voiceName);
  synth.SetOutputToWaveStream(ret);
  //synth.SetOutputToAudioStream(ret, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
  synth.Speak(textToSpeak);
}

Solution:

Many thanks to @Hans Passant, here is the gist of what I'm using now:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
  var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
  mi.Invoke(synth, new object[] { ret, fmt, true, true });
  synth.SelectVoice(voiceName);
  synth.Speak(textToSpeak);
}
return ret;

For my rough testing it works great, though using reflection is a bit icky it's better than writing the file to disk and opening a stream.

737

asked Oct 06 '10 18:10

AceJordin

1 Answers

Your code snippet is borked, you're using synth after it is disposed. But that's not the real problem I'm sure. SetOutputToAudioStream produces the raw PCM audio, the 'numbers'. Without a container file format (headers) like what's used in a .wav file. Yes, that cannot be played back with a regular media program.

The missing overload for SetOutputToWaveStream that takes a SpeechAudioFormatInfo is strange. It really does look like an oversight to me, even though that's extremely rare in the .NET framework. There's no compelling reason why it shouldn't work, the underlying SAPI interface does support it. It can be hacked around with reflection to call the private SetOutputStream method. This worked fine when I tested it but I can't vouch for it:

using System.Reflection;
...
            using (Stream ret = new MemoryStream())
            using (SpeechSynthesizer synth = new SpeechSynthesizer()) {
                var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
                var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Eight, AudioChannel.Mono);
                mi.Invoke(synth, new object[] { ret, fmt, true, true });
                synth.Speak("Greetings from stack overflow");
                // Testing code:
                using (var fs = new FileStream(@"c:\temp\test.wav", FileMode.Create, FileAccess.Write, FileShare.None)) {
                    ret.Position = 0;
                    byte[] buffer = new byte[4096];
                    for (;;) {
                        int len = ret.Read(buffer, 0, buffer.Length);
                        if (len == 0) break;
                        fs.Write(buffer, 0, len);
                    }
                }
            }

If you're uncomfortable with the hack then using Path.GetTempFileName() to temporarily stream it to a file will certainly work.

174

answered Oct 23 '22 07:10

Hans Passant

Related questions
                            
                                DllImport and char*
                            
                                Windows Service is not Working
                            
                                Using Prism with Ninject
                            
                                Sending email via GMail in .NET [duplicate]
                            
                                SQLite assembly not copied to output folder for unit testing
                            
                                Where should I store my application data?
                            
                                OutOfMemoryError calling XmlSerializer.Deserialize() - not related to XML size!
                            
                                A 110 kb .NET 4.0 app needs 10 seconds for a cold start, thats not acceptable !
                            
                                How do I run C# 4.0 compiler with CSharpCodeProvider class?
                            
                                Setting the cores to use in Parallelism
                            
                                Verify remote SSL certificate during HTTPS request
                            
                                WindowLicker for .NET's WinForms?
                            
                                C#: Monitoring copied or moved files with FileSystemWatcher
                            
                                Best practice for securing username/password between clients and server
                            
                                DBML customization vs regeneration
                            
                                Why does Enum.Parse create undefined entries?
                            
                                Custom CodeAccessSecurityAttribute
                            
                                C# Memory leak, tracking techinques and tools
                            
                                Limiting Starts or Run Time for Evaluation Software in C# and Windows
                            
                                Is there a graphical overview of the HatchStyle enumeration?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

TTS to Stream with SpeechAudioFormatInfo using SpeechSynthesizer

Tags:

.net

text-to-speech

speech-synthesis

speechsynthesizer

Solution:

AceJordin

People also ask

1 Answers

Hans Passant

Recent Activity

Donate For Us