Is it possible to use W3C Web Speech API to write Javascript code which generates audio file (wav, ogg or mp3) with voice speaking given text? I mean, I want to do something like:
window.speechSynthesis.speak(new SpeechSynthesisUtterance("0 1 2 3"))
but I want sound generated with it not to be output to speakers but to file.
Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. You must enable billing to use Text-to-Speech, and will be automatically charged if your usage exceeds the number of free characters allowed per month.
The Web Speech API is powerful and somewhat underused. However, there are a few annoying bugs and the SpeechRecognition interface is poorly supported. speechSynthesis works surprisingly well once you iron out all of its quirks and issues.
The requirement is not possible using Web Speech API alone, see Re: MediaStream, ArrayBuffer, Blob audio result from speak() for recording?, How to implement option to return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak() call
Though requirement is possible using a library, for example, espeak
or meSpeak
, see How to create or convert text to audio at chromium browser?.
fetch("https://gist.githubusercontent.com/guest271314/f48ee0658bc9b948766c67126ba9104c/raw/958dd72d317a6087df6b7297d4fee91173e0844d/mespeak.js")
.then(response => response.text())
.then(text => {
const script = document.createElement("script");
script.textContent = text;
document.body.appendChild(script);
return Promise.all([
new Promise(resolve => {
meSpeak.loadConfig("https://gist.githubusercontent.com/guest271314/8421b50dfa0e5e7e5012da132567776a/raw/501fece4fd1fbb4e73f3f0dc133b64be86dae068/mespeak_config.json", resolve)
}),
new Promise(resolve => {
meSpeak.loadVoice("https://gist.githubusercontent.com/guest271314/fa0650d0e0159ac96b21beaf60766bcc/raw/82414d646a7a7ef11bb04ddffe4091f78ef121d3/en.json", resolve)
})
])
})
.then(() => {
// takes approximately 14 seconds to get here
console.log(meSpeak.isConfigLoaded());
console.log(meSpeak.speak("what it do my ninja", {
amplitude: 100,
pitch: 5,
speed: 150,
wordgap: 1,
variant: "m7",
rawdata: "mime"
}));
})
.catch(err => console.log(err));
There is also workaround using MediaRecorder
, depending on system hardware How to capture generated audio from window.speechSynthesis.speak() call?.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With