Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify multiple speakers and their text from an audio input?

I am using Microsoft's cognitive services. I have an audio input and need to identify multiple speakers and their individual text.

As per my understanding, Speaker Rekognition API can identify different individuals and Bing Speech API can convert speech to text. However, to do both at the same time, I need to manually split audio file into pieces (based on pause/silence) and then send the audio stream to individual services. Is there a better way to do it? Any other ecosystem that I should switch to like AWS Lex/Polly or Google's offerings?

like image 577
blackspacer Avatar asked Jan 31 '17 13:01

blackspacer


People also ask

How do I know if my speakers are transcription?

In the transcript editor, click Edit Transcript to enter editing mode if needed. Find the cue whose speaker you need to identify, and click Speaker. Select the speaker you want from the list, as shown in the below figure.


1 Answers

You should try IBM Watson Speech to Text API. They have a feature called Speaker Diarization that will be useful for your use case.

More details here: https://www.ibm.com/blogs/watson/2016/12/look-whos-talking-ibm-debuts-watson-speech-text-speaker-diarization-beta/

like image 194
Bhavik Shah Avatar answered Dec 19 '22 16:12

Bhavik Shah