Is there an option to diarize the output when using the import speech_recognition in Python?
I would appreciate advice on this, or whether it is possible.
Furthermore, any advice on then outputting this information in a text file with lines between each new speaker would be greatly appreciated.
import speech_recognition as sr
from os import path
from pprint import pprint
audio_file = path.join(path.dirname(path.realpath(__file__)), "RobertP.wav")
r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
audio = r.record(source)
try:
txt = r.recognize_google(audio, show_all=True)
except:
print("Didn't work.")
text = str(txt)
f = open("tester.txt", "w+")
f.write(text)
f.close()
Note: apologies for my novice-ness.
Speaker diarization is currently in beta in Google Speech-to-Text API. You can find the documentation of this feature here. Handling on the output can be done in many ways. The following is an example (based on this Medium article):
import io
def transcribe_file_with_diarization(speech_file):
“””Transcribe the given audio file synchronously with diarization.”””
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
with io.open(speech_file, ‘rb’) as audio_file:
content = audio_file.read()
audio = {"content": content}
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
sample_rate_hertz=48000
language_code=’en-US’
enable_speaker_diarization=True
enable_automatic_punctuation=True
diarization_speaker_count=4
config = {
"encoding": encoding,
"sample_rate_hertz": sample_rate_hertz,
"language_code": language_code,
"enable_speaker_diarization": enable_speaker_diarization,
"enable_automatic_punctuation": enable_automatic_punctuation,
# Optional:
"diarization_speaker_count": diarization_speaker_count
}
print(‘Waiting for operation to complete…’)
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
speaker1_transcript=””
speaker2_transcript=””
speaker3_transcript=””
speaker4_transcript=””
# Printing out the output:
for word_info in words_info:
if(word_info.speaker_tag==1):
speaker1_transcript=speaker1_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==2):
speaker2_transcript=speaker2_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==3):
speaker3_transcript=speaker3_transcript+word_info.word+’ ‘
if(word_info.speaker_tag==4):
speaker4_transcript=speaker4_transcript+word_info.word+’ ‘
print(“speaker1: ‘{}’”.format(speaker1_transcript))
print(“speaker2: ‘{}’”.format(speaker2_transcript))
print(“speaker3: ‘{}’”.format(speaker3_transcript))
print(“speaker4: ‘{}’”.format(speaker4_transcript))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With