What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Tags:

There are two similar namespaces and assemblies for speech recognition in .NET. I’m trying to understand the differences and when it is appropriate to use one or the other.

There is System.Speech.Recognition from the assembly System.Speech (in System.Speech.dll). System.Speech.dll is a core DLL in the .NET Framework class library 3.0 and later

There is also Microsoft.Speech.Recognition from the assembly Microsoft.Speech (in microsoft.speech.dll). Microsoft.Speech.dll is part of the UCMA 2.0 SDK

I find the docs confusing and I have the following questions:

System.Speech.Recognition says it is for "The Windows Desktop Speech Technology", does this mean it cannot be used on a server OS or cannot be used for high scale applications?

The UCMA 2.0 Speech SDK ( http://msdn.microsoft.com/en-us/library/dd266409%28v=office.13%29.aspx ) says that it requires Microsoft Office Communications Server 2007 R2 as a prerequisite. However, I’ve been told at conferences and meetings that if I do not require OCS features like presence and workflow I can use the UCMA 2.0 Speech API without OCS. Is this true?

If I’m building a simple recognition app for a server application (say I wanted to automatically transcribe voice mails) and I don’t need features of OCS, what are the differences between the two APIs?

871

asked Jun 04 '10 19:06

Michael Levy

1 Answers

The short answer is that Microsoft.Speech.Recognition uses the Server version of SAPI, while System.Speech.Recognition uses the Desktop version of SAPI.

The APIs are mostly the same, but the underlying engines are different. Typically, the Server engine is designed to accept telephone-quality audio for command & control applications; the Desktop engine is designed to accept higher-quality audio for both command & control and dictation applications.

You can use System.Speech.Recognition on a server OS, but it's not designed to scale nearly as well as Microsoft.Speech.Recognition.

The differences are that the Server engine won't need training, and will work with lower-quality audio, but will have a lower recognition quality than the Desktop engine.

152

answered Sep 20 '22 01:09

Eric Brown

Related questions
                            
                                How do I set a number of retry attempts in RabbitMQ?
                            
                                Retrieving Dictionary Value Best Practices
                            
                                Is there a WPF Cheat Sheet available? [closed]
                            
                                String.Format - how it works and how to implement custom formatstrings
                            
                                What do programmers mean when they say, "Code against an interface, not an object."?
                            
                                using FUSLOGVW.EXE on a machine with no Visual Studio installed
                            
                                .NET Events - What are object sender & EventArgs e?
                            
                                Difference between destructor, dispose and finalize method
                            
                                What does "T" mean in C#?
                            
                                Is there a serializable generic Key/Value pair class in .NET?
                            
                                Does Dapper support SQL 2008 Table-Valued Parameters?
                            
                                How to Async Files.ReadAllLines and await for results?
                            
                                Ignoring a field during .NET JSON serialization; similar to [XmlIgnore]?
                            
                                dotnet restore warning NU1701
                            
                                What are Automatic Properties in C# and what is their purpose?
                            
                                WPF MVVM: How to close a window
                            
                                How does SQLParameter prevent SQL Injection?
                            
                                Populating a list of integers in .NET
                            
                                What is ToString("N0") format?
                            
                                Is returning IList<T> worse than returning T[] or List<T>?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Tags:

.net

speech-recognition

speech

ucma2.0

ucs

Michael Levy

People also ask

1 Answers

Eric Brown

Recent Activity

Donate For Us