Difference among Microsoft Speech products/platforms

Tags:

It seems Microsoft offers quite a few speech recognition products, I'd like to know the differences among all of them pls.

There is Microsoft Speech API, or SAPI. But somehow Microsoft Cognitive Service Speech API has the same name.
Ok now, Microsoft Cognitive Service on Azure offers Speech service API and Bing Speech API. I assume for speech-to-text, both APIs are the same.
And then there is System.Speech.Recognition (or Desktop SAPI), Microsoft.Speech.Recognition (or Server SAPI) and Windows.Media.Speech.Recognition. Here and here have some explanations on the difference among the three. But my guesses are they are old speech recognition models based on HMM, aka are not neural network models, and all three can be used offline without internet connection, right?
For the Azure speech service and bing speech APIs, they are more advanced speech models right? But I assume there is no way to use them offline on my local machine, as they all require subscription verification. (even tho it seems Bing API has a C# desktop library..)

Essentially I want to have a offline model which does speech-to-text transcription, for my conversation data (5-10 mins for each audio recording), which recognises multi-speakers and outputs timestamps (or timecoded output). I am a bit confused now by all the options. I would be greatly appreciated if someone can explain to me, many thanks!

760

asked Jun 12 '18 17:06

Blue482

Video Answer

1 Answers

A difficult question - and part of the reason why it is so difficult: We (Microsoft) seem to present an incoherent story about 'speech' and 'speech apis'. Although I work for Microsoft, the following is my view on this. I try to give some insight on what is being planned in my team (Cognitive Service Speech - Client SDK), but I can't predict all facets of the not-so-near-future.

Early on Microsoft recognized that speech is an important medium, so Microsoft has an extensive and long running history enabling speech in its products. There are really good speech solutions (with local recognition) available, you listed some of those.

We are working on unifying this, and present one place for you to find the state-of-the-art speech solution at Microsoft. This is 'Microsoft Speech Service' (https://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/) - currently in preview.

On the service side it will combine our major speech technologies, like speech-to-text, text-to-speech, intent, translation (and future services) under one umbrella. Speech and languages models are constantly improved and updated. We are developing a client SDK for this service. Over time (later this year) this SDK will be available on all major operating systems (Windows, Linux, Android, iOS) and have support for major programming languages. We will continue to enhance/improve platform and language support for the SDK.

This combination of online service and client SDK will leave the preview-state later this year.

We understand the desire to have local recognition capabilities. It will not be available 'out-of-the-box' in our first SDK release (it is also not part of the current preview). One goal for the SDK is parity (functionality and API) between platforms and languages. This needs a lot of work. Offline is not part of this right now, I can't make any prediction here, neither in features nor timeline ...

So from my point of view - the new Speech Services and the SDK is the way forward. The goal is a unified API on all platforms, easy access to all Microsoft Speech Services. It requires the subscription key, it requires you are 'connected'. We are working hard to get both (server and client) out of preview status later this year.

Hope this helps ...

Wolfgang

answered Sep 20 '22 17:09

wolfma

Related questions
                            
                                C# System.Speech notfound!
                            
                                Microsoft Speech Recognition - what reference do I have to add?
                            
                                RecognitionListener in JellyBean Freezes if not spoken to immediately
                            
                                C# - Free Offliine speech recognition library (SDK)
                            
                                Continuous speech recognition while singing?
                            
                                Android Continuous speech recognition returns ERROR_NO_MATCH too quickly
                            
                                Open source speech recognition engine [closed]
                            
                                Using Mac OSX Dictation with Speech API
                            
                                How does Google Keep do Speech Recognition while saving the audio recording at the same time?
                            
                                How to use the Web Speech API in NodeJS
                            
                                How do I use voice search and VoiceRecognition on Android?
                            
                                English US language code changed? Google Speech Api v2 not returning the correct result
                            
                                How to detect speech start on iOS Speech API
                            
                                Speech recognition using a real time stream
                            
                                speech recognition from audio file instead of microphone
                            
                                How to convert human voice into digital format?
                            
                                DTMF tone in RecognitionListener.onReadyForSpeech() mistaken for speech
                            
                                web speech api speech synthesis - getting voice list
                            
                                Google's Speech Recognition API Usage Limits

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference among Microsoft Speech products/platforms

Tags:

speech-recognition

speech-to-text

microsoft-cognitive

microsoft-speech-platform

microsoft-speech-api

Blue482

People also ask

Video Answer

1 Answers

wolfma

Recent Activity

Donate For Us