We use a solution in C#.net where someone can call a phone number and speak a persons First, and then Last Name. Then the name is entered on a guest registry on our website. We use an XML dictionary file with 5,000 First Names and 89,000 last names that we got from the US Census. We are using the Microsoft.Speech.Recognition library, (maybe that's the problem).
Our problem is that even with relatively easy names like Joshua McDaniels we are getting about a 30% fail rate. The performance, (speed-wise), is fine it just doesn't grab a good portion of the names.
Now, I understand that ultimately the quality of the spoken name will dictate, sorry for the pun, how well the system performs, but what we would like to get close to 99% in "laboratory" conditions with perfect enunciation and no accent and then call it good. But even after several trials with the same person speaking, same name, same phone, same environment, we are getting a 25% fail rate.
My question is: Does anyone have an idea of a better way to go after this? We thought of maybe trying to use an API, that way the matches would be more relevant and current.
The current state of the technology is that it is very hard to recognize names, moreover a large list of them. You can recognize names from the phone book (500 entries) with good quality, but for thousands of them it is very hard. Speech recognition engines are certainly not designed for that, in particular offline ones like System.Speech.
You might get way better results with online systems like https://www.projectoxford.ai which use advanced DNN acoustic models and bigger vocabularies.
There were whole big companies built around the capability to recognize large name lists, for example Novauris used patented technology for that. You might consider building something like that using open source engine, but it would be a large undertaking anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With