Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speech Synthesis - Creating Custom Voices [closed]

Is it possible, programatically, to take someone's voice sample and produce a unique tone/property that could be used to create a synthesised speech?

For example, person A records himself. A unique tone is produced from this voice sample, and is being turned into synthesis speech. This allows people to use this synthetic voice in Text-to-Speech software, writing any text that they want that would be read in person A's voice.

Is it possible in today's terms? I know that there are companies that do this professionally, but generally, is it possible for a piece of software to do this?

like image 557
Travier Avatar asked Apr 08 '14 17:04

Travier


People also ask

How do I put custom voices on speech?

Go to Text-to-Speech > Custom Voice > select a project, and select Set up voice talent. Select Add voice talent. Next, to define voice characteristics, select Target scenario. Then describe your Voice characteristics.

What is speech synthesis?

Speech synthesis is the computer-generated simulation of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and Unified messaging .

How do you synthesize a speech?

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity.

What is a custom text-to-speech?

Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you find and work with a voice actor who represents the custom voice you're aiming for.


1 Answers

Using speaker adaptation methods you can achieve some results with comparably few training samples but still you should have some hundred sentences of the person - preferably with a phonetic transcription.

We once had this as a small lab exercise for students to record their own voices and train a voice model using HTS (http://hts.sp.nitech.ac.jp/). The "most simple" approach using HTS is to download the "Speaker dependent training demo" from this page and replace the training speech samples with your own recordings (of the same sentences!). We did this for another language with our own package though.

I think MaryTTS (http://mary.dfki.de/) has some more convenient tools to assist with this process but I've never worked with that.

But still - for high quality voices, you should have thousands of recorded sentences.

like image 96
Markus Toman Avatar answered Sep 22 '22 16:09

Markus Toman