Is it possible, programatically, to take someone's voice sample and produce a unique tone/property that could be used to create a synthesised speech?
For example, person A records himself. A unique tone is produced from this voice sample, and is being turned into synthesis speech. This allows people to use this synthetic voice in Text-to-Speech software, writing any text that they want that would be read in person A's voice.
Is it possible in today's terms? I know that there are companies that do this professionally, but generally, is it possible for a piece of software to do this?
Go to Text-to-Speech > Custom Voice > select a project, and select Set up voice talent. Select Add voice talent. Next, to define voice characteristics, select Target scenario. Then describe your Voice characteristics.
Speech synthesis is the computer-generated simulation of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and Unified messaging .
Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity.
Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you find and work with a voice actor who represents the custom voice you're aiming for.
Using speaker adaptation methods you can achieve some results with comparably few training samples but still you should have some hundred sentences of the person - preferably with a phonetic transcription.
We once had this as a small lab exercise for students to record their own voices and train a voice model using HTS (http://hts.sp.nitech.ac.jp/). The "most simple" approach using HTS is to download the "Speaker dependent training demo" from this page and replace the training speech samples with your own recordings (of the same sentences!). We did this for another language with our own package though.
I think MaryTTS (http://mary.dfki.de/) has some more convenient tools to assist with this process but I've never worked with that.
But still - for high quality voices, you should have thousands of recorded sentences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With