Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

KitKat takes 6 seconds more than Froyo to react to TextToSpeech.speak() on first call

On a recent phone running a recent version of Android, the TextToSpeech engine may take around 6 seconds longer to react when it is first called, compared to an older phone.

My test code is shown below. (EDITED: Alternative code for Android 4.0.3 Ice Cream Sandwich, API 15 and above, shown at the end.)

On a 1 year old Motorola Moto G running 4.4.4 KitKat, it can take over 7 seconds for the TextToSpeech engine to complete the first call to speak() the word "Started". Here's the output of my code.

D/speak﹕ call: 1415501851978
D/speak﹕ done: 1415501859122, delay: 7144

On a 3 year old Samsung SGH-T499Y running 2.2 Froyo, it take less than a second to finish speaking:

D/speak﹕ call: 1415502283050
D/speak﹕ done: 1415502283900, delay: 850

Is there a way to discover what is happening during this 6-second delay?
Is there some way to get the newer (and supposedly faster) device to react more quickly?

package com.example.speak

import android.app.Activity;
import android.speech.tts.TextToSpeech;
import android.os.Bundle;
import android.util.Log;

import java.util.HashMap;
import java.util.Locale;


public class MainActivity extends Activity implements TextToSpeech.OnInitListener,
        TextToSpeech.OnUtteranceCompletedListener {

    private final String TAG = "speak";
    private Activity activity;
    private TextToSpeech tts;
    private long launchTime;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);
    }

    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            tts.setOnUtteranceCompletedListener(this);
            tts.setLanguage(Locale.UK);
            ttsSay("Started");
        }
    }

    private void ttsSay(String toSpeak) {
        int mode = TextToSpeech.QUEUE_FLUSH;
        HashMap hashMap = new HashMap<String, String>();
        hashMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, TAG);

        launchTime = System.currentTimeMillis();
        Log.d(TAG, "call: " + launchTime);
        tts.speak(toSpeak, mode, hashMap);
    }

    public void onUtteranceCompleted(String utteranceID) {
        long millis = System.currentTimeMillis();
        Log.d(TAG, "done: " + millis + ", delay: " + (millis - launchTime));
    }
}

EDIT: Starting with Ice Cream Sandwich 4.0.3, API 15, Android provides UtteranceProgressListener, which can be used to time both the start and end of the text-to-speech playback. The following is not compatible with Froyo;

package com.example.announceappprogress;

import android.app.Activity;
import android.speech.tts.TextToSpeech;
import android.os.Bundle;
import android.speech.tts.UtteranceProgressListener;
import android.util.Log;

import java.util.HashMap;
import java.util.Locale;


public class MainActivity extends Activity implements TextToSpeech.OnInitListener {

    private final String TAG = "speak";
    private TextToSpeech tts;
    private long launchTime;
    private long startTime;


    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);
        tts.setOnUtteranceProgressListener(mProgressListener);
    }

    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            tts.setLanguage(Locale.UK);
            ttsSay("Started");
        }
    }

    private void ttsSay(String toSpeak) {
        int mode = TextToSpeech.QUEUE_FLUSH;
        HashMap hashMap = new HashMap<String, String>();
        hashMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, TAG);

        launchTime = System.currentTimeMillis();
        Log.d(TAG, "called: " + launchTime);
        tts.speak(toSpeak, mode, hashMap);
    }

    private final UtteranceProgressListener mProgressListener = new UtteranceProgressListener() {
        @Override
        public void onStart(String utteranceId) {
            startTime = System.currentTimeMillis();
            Log.d(TAG, "started: " + startTime + ", delay: " + (startTime - launchTime));
        }

        @Override
        public void onError(String utteranceId) {} // Do nothing.


        @Override
        public void onDone(String utteranceId) {
            long millis = System.currentTimeMillis();
            Log.d(TAG, "done: " + millis + ", total: " + (millis - launchTime) + ", duration: " + (millis - startTime));
        }
    };
}

Here is a sample of the output that this gives on the Motorola Moto G running 4.4.4 KitKat:

D/speak﹕ called:  1415654293442
D/speak﹕ started: 1415654299287, delay: 5845
D/speak﹕ done:    1415654299995, total: 6553, duration: 708
like image 980
James Newton Avatar asked Nov 10 '22 23:11

James Newton


1 Answers

You are probably not using the same TTS engine on both devices.

More human sounding concatenative TTS engines (which you may have installed on your newer device) can use hundreds of megabytes of data files to generate speech. Most of these systems require a certain amount of setup time for the first utterance. Simple (and more mechanical sounding) formant based systems may require just a couple megabytes, and so load much more quickly.

An interesting experiment would be to time the "second" utterance. I predict that will be quicker than the first one was on your newer phone. Also a more natural sounding TTS systems generally have a longer latency time between calling the TTS and the beginning of sound from the utterance. Particularly if a long sentence is given since the system looks over the entire sentence to formulate the best phrasing before beginning the utterance.

Also, are you sure your new device is not using some cloud-based TTS service? There are other significant, additional variables that will affect latency.

like image 199
industrialpoet Avatar answered Nov 15 '22 11:11

industrialpoet