In-app, I have to compare live recording from previously locally stored voice command if it matches(not only text but also identified person's voice) then perform necessary action.
1-match voice commands from the same person.
2-match command 's text.
I applied many ways but none are working as per my expectation.
First: use Speech to text Library like OpenEars,SpeechKit but these libraries convert only text from speech.
Result: Failed As My expectation
Second:(Audio Finger printing)
acrcloud Library : in this library, I record a command and stored that mp3file on acrcloud server and match with live recording(spoken by me) it doesn't match but when I play the same recording(recorded MP3 file of my voice ) which is uploaded to the acrcloud server then it matches. Result: Failed As My expectation
API.AI : in this library,it is like speech to text ,I stored some text command on his server and then anyone speaks the same command the result get success. Result: Failed As My expectation
Please Suggest me how to solve this problem for iOS Application
This is how I would approach this If I understand ur requirements correctly:
You will need to compare the audio spectrum of each recording to match the person (Look at the vDSP in Accelerate framework) An FFT analysis with 1024 window should be enough (if not try doubling it for more detail) I would start the comparison with 5-10 peaks in the spectrum and experiment from there. check out EZAudio for an easy FFT implementation to get you started.
Use a speech to text library to match the text. Speech accents usually distort their results considerably so I would probably start by trying getting the text from both audio and comparing instead of specifying a command in text to match.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With