build speech to text system from scratch using python

Question

I am in need to Speech to text system so that I can transcribe audio files to text format. While researching on that I found systems created by big companies e.g Amazon Transcribe, Google Speech to Text, IBM Watson etc. And found all the libraries in python internal make use of those APIs.

What would be the steps if I want to create such a system myself? I could not find any detailed article on that. How to build your own system for speech recognition.

The main reason I want to create my own system is because I cannot send the audio files to external APIs due to security reasons.

The main goal is I have recordings of persons talking mostly in English language and I want to transcribe that audio to text.

Please let me know if you have any other ideas of doing the same instead of sending audio files to external systems.

ccpizza · Accepted Answer

You can run open.ai's whisper locally on your own hardware. You'll only need a network connection to download the neural models once. Once that's done none of the data you will be processing will leave your computer.

To have it running at reasonable speed you'll need a beefy GPU setup with cuda properly configured so that pytorch can use it. Running it on CPU will be orders of magnitude slower and likely to last for days (depending on your required throughput).

build speech to text system from scratch using python

Tags:

python

machine-learning

deep-learning

speech-recognition

speech-to-text

Kuldeep Singh

1 Answers

ccpizza

Recent Activity

Donate For Us

build speech to text system from scratch using python

Tags:

python

machine-learning

deep-learning

speech-recognition

speech-to-text

Kuldeep Singh

1 Answers

ccpizza

Related questions

Recent Activity

Donate For Us