I am in need to Speech to text system so that I can transcribe audio files to text format. While researching on that I found systems created by big companies e.g Amazon Transcribe, Google Speech to Text, IBM Watson etc. And found all the libraries in python internal make use of those APIs.
What would be the steps if I want to create such a system myself? I could not find any detailed article on that. How to build your own system for speech recognition.
The main reason I want to create my own system is because I cannot send the audio files to external APIs due to security reasons.
The main goal is I have recordings of persons talking mostly in English language and I want to transcribe that audio to text.
Please let me know if you have any other ideas of doing the same instead of sending audio files to external systems.
You can run open.ai's whisper locally on your own hardware. You'll only need a network connection to download the neural models once. Once that's done none of the data you will be processing will leave your computer.
To have it running at reasonable speed you'll need a beefy GPU setup with cuda properly configured so that pytorch can use it. Running it on CPU will be orders of magnitude slower and likely to last for days (depending on your required throughput).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With