Is there a way to force Google Speech api to return only words as response?

Tags:

I am using Googles this api :-

https://www.google.com/speech-api/v2/recognize?output=json&lang="+ language_code+"&key="My key"

for speech recognition and it's working very well.

The issue is with numbers i.e, if I say one two three four the result will be 1234 and if I say one thousand two hundred thirty four the result is still 1234.

Another issue is that with other languages i.e. the word elf in German is eleven. If you say elf the result is 11, instead of elf.

I know we have no control over the api but is there any parameters or hacks we can add to this api to force it to return only words.

The response some times have the correct result but not always.

These are sample responses

1) When I say "one two three four"

{"result":[{"alternative":[{"transcript":"1234","confidence":0.47215959},{"transcript":"1 2 3 4","confidence":0.25},{"transcript":"one two three four","confidence":0.25},{"transcript":"1 2 34","confidence":0.33333334},{"transcript":"1 to 34","confidence":1}],"final":true}],"result_index":0}

2) When I say "one thousand two hundred thirty four"

{"result":[{"alternative":[{"transcript":"1234","confidence":0.94247383},{"transcript":"1.254","confidence":1},{"transcript":"1284","confidence":1},{"transcript":"1244","confidence":1},{"transcript":"1230 4","confidence":1}],"final":true}],"result_index":0}

What I have done.

Check if the result is a number, Then split each number by space and check if there is same sequence in the result array. In this e.g. Result 1234 becomes 1 2 3 4 and will search if there is a similar sequence in the result array and then convert it to words.In 2nd case there is no 1 2 3 4 so will stick with the original result.

This is the code.

 String numberPattern = "[0-9]";
  Pattern r1 = Pattern.compile(numberPattern);
  Matcher m2 = r1.matcher(output);
  if (m2.find()) {
      char[] digits2 = output.toCharArray();
      String digit = "";
      for (char c: digits2) {
          digit += c + " ";
      }

      for (int i = 1; i < jsonArray2.length(); i++) {
          String value = jsonArray2.getJSONObject(i).getString("transcript");
          if (digit.trim().equals(value.trim())) {
              output = digit + " ";
          }
      }
  }

So the issue is when I "say thirteen four eight" this method will split 13 as one three and hence not a reliable solution.

Update

I tried the new cloud vision api (https://cloud.google.com/speech/) and it's little better than the v2. The result for one two three four is in words itself for which my workaround is working as well. But when I say thirteen four eight it's still the same result as in v2.

And also elf is still 11 in German.

Also tried speech_context that also didn't worked.

681

asked Mar 14 '17 11:03

sunil sunny

1 Answers

Take a look at this question and answer.

You can give the API "speech context" hints, like this:

"speech_context": {
  "phrases":["zero", "one", "two", ... "nine", "ten", "eleven", ... "twenty", "thirty,..., "ninety"]
 }

I imagine this could work for other languages too, like German.

"speech_context": {
  "phrases":["eins", "zwei", "drei", ..., "elf", "zwölf" ... ]
 }

149

answered Oct 08 '22 22:10

blambert

Related questions
                            
                                select into insert from values() with correct type casts using jOOQ
                            
                                How to make Selenium WebDriver select client certificates dynamically without visually detecting the popup
                            
                                Using SLF with Java9 modules
                            
                                "Do you want to run this application" JNLP dialog - conditions for the dialog to be shown again
                            
                                HttpMediaTypeNotAcceptableException for errors with text/plain message response?
                            
                                How to use servlet 3.1 in spring mvc?
                            
                                Getting Java accessibility straight on Windows
                            
                                Send MMS from My application in android
                            
                                MATLAB hangs when I try to use the java package jdde, but only for the first time after a system reboot
                            
                                Why is Java 7 Files.walkFileTree throwing exception on encountering a tar file on remote drive
                            
                                OpenCV FeatureDetector
                            
                                Java EE deployment in Intellij IDEA
                            
                                Java Generic with 1 type parameter and 2 constraints
                            
                                Active Directory Authentication using Spring Security 3.2, Spring Ldap 2.0 and JavaConfig
                            
                                how to change log4j log file to utf8
                            
                                Synch and Asynchronous interface of MqttClient object are not working
                            
                                Reading PEM public key into iOS
                            
                                dagger 2 circular dependency
                            
                                Should I reuse one instance of GSON or create new ones on demand?
                            
                                Can JavaFX's ListChangeListener.Change.getRemoved() return non-contiguous items?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a way to force Google Speech api to return only words as response?

Tags:

java

android

speech-recognition

google-speech-api

sunil sunny

People also ask

1 Answers

blambert

Recent Activity

Donate For Us