API to break voice into phonemes / synthesize new speech given speech samples?

Tags:

You know those movies where the tech geeks record someone's voice, and their software breaks it into phonemes? Which they can then use to type in any phrase, and make it seem as if the target is saying it?

Does that software exist in an API Version? I don't even know what to Google.

350

asked Aug 11 '11 01:08

AShelly

4 Answers

There is no such software. Breaking arbitrary speech into its constituent phonemes is only a partially solved problem: speech-to-text software is still imperfect, as is text-to-speech.

The idea is to reproduce the timbre of the target's voice. Even if you were able to segment the audio perfectly, reordering the phonemes would produce audio with unnatural cadence and intonation, not to mention splicing artifacts. At that point you're getting into smoothing, time-scaling, and pitch correction, all of which are possible and well-understood in theory, but operate poorly on real-world data, especially when the audio sample in question is as short as a single phoneme, and further when the timbre needs to be preserved.

These problems are compounded on the phonetic side by allophonic variation in sounds based on accent and surrounding phonemes; in order to faithfully produce even a low-quality approximation of the audio, you'd need a detailed understanding of the target's language, accent, and speech patterns.

Furthermore, your ultimate problem is one of social engineering, and people are not easy to fool when it comes to the voices of people they know. Even with a large corpus of input data, at best you could get a short low-quality sample, hardly enough for a conversation.

So while it's certainly possible, it's difficult; even if it existed, it wouldn't always be good enough.

answered Sep 17 '22 03:09

Jon Purdy

SRI International (the company that created Siri for iOS) has an SDK called EduSpeak, which will take audio input and break it down into individual phonemes. I know this because I sat through a demo of the product about a week ago. During the demo, the presenter showed us an application that was created using the SDK. The application gave a few lines of text for the presenter to read. After reading the text, the application displayed a bar chart where each bar represented a phoneme from his speech. The height of each bar represented a score of how well each phoneme was pronounced (the presenter was not a native English speaker, so he received lower scores on certain phonemes compared to others). The presenter could also click on each individual bar to have only that individual phoneme played back using the original audio.

So yes, software exists that divides audio up by phoneme, and it does a very good job of it. Now, whether or not those phonemes can be re-assembled into speech is an open question. If we end up getting a trial version of the SDK, I'll try it out and let you know.

answered Sep 21 '22 03:09

David Jones

If your aim is to mimic someone else's voice, then another attitude is to convert your own voice (instead of assembling phonemes). It is (surprisingly) called voice conversion, e.g http://www.busim.ee.boun.edu.tr/~speech/projects/Voice_Conversion.htm

answered Sep 18 '22 03:09

Itamar Katz

The technology is called "voice synthesis" and "voice recognition"

The java API for this can be found here Java voice JSAPI

Apple has an API for this Apple speech

Microsoft has several ...one is discussed here Vista speech

answered Sep 19 '22 03:09

stimpy

Related questions
                            
                                Make a secure oauth API with passport.js and express.js (node.js)
                            
                                Edit conflicts and 409 vs. 412 responses
                            
                                GitHub api - for a forked repository object, how to get what repository its forked from?
                            
                                What is the best way to access Graphite data programmatically? [closed]
                            
                                Looking for a simple Java API for creating graphs (edges + nodes) [closed]
                            
                                CouchDB as the Restful API layer?
                            
                                Securing a REST API and Slim Framework
                            
                                How do I get a list of the window titles on the Mac OSX?
                            
                                Paginate relationship in Django REST Framework?
                            
                                SaveAs vs SaveAs2 in the Microsoft Office Word object model
                            
                                What is best way to pass multiple query parameters to a restful api?
                            
                                Default query params not getting passed in axios request
                            
                                API designing - best practice and how to support multiple versions
                            
                                WebSocket Library [closed]
                            
                                REST API Framework. Recommended behavior for invalid querystring parameter
                            
                                Django Serialize Queryset to JSON to construct RESTful response with only field information and id
                            
                                How to use SoundCloud API in Java (Android App)
                            
                                API design and security: Why hide internal ids?
                            
                                What Amazon REST API do I use to get book information? [closed]
                            
                                Why does the Java List interface not support getLast()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

API to break voice into phonemes / synthesize new speech given speech samples?

Tags:

api

signal-processing

audio

phoneme