Speech to text by AWS service using Java API

Question

I would like to convert speech to text using an AWS service and the AWS java-sdk, but I am unable to find any API in the AWS java-sdk. Is there any service which does this? I have used AWS Polly service to convert text to speech using AWS java-sdk, but not the reverse (speech to text). How could this be done?

Edgardo Genini · Accepted Answer

Recently I have managed to build a Java client, before investing time in this it is important to say that as of the date of this publication the time it takes to obtain a text of an audio that contains a "Yes" is approximately 1 min. Given that performance, I opted for the Google service.

That said I share the code which is improvable since it was intended to perform a feasibility test.

This service requires that the audio be housed in a bucket and then it is indicated to transcribe the uri, then the work is launched and in a similar way the result is obtained in json format.

In the example, we choose to wait for the work to finish and then obtain the result.

The main dependencies are:

    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-transcribe -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-transcribe</artifactId>
    <version>1.11.313</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-s3</artifactId>
    <version>1.11.313</version>
</dependency>

my choice of credentials:

static{
    System.setProperty("aws.accessKeyId", "yourAccessK");
    System.setProperty("aws.secretKey"  , "shhhhhhhhhh");
}

In the source we will create the S3 and tanscribe client, replace the region with the one that corresponds to the bucket.

private AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion("us-east-1").withClientConfiguration(new ClientConfiguration()).withCredentials(new DefaultAWSCredentialsProviderChain() ).build();
private AmazonTranscribe client = AmazonTranscribeClient.builder().withRegion("us-east-1").build();

then we upload the audio file to the bucket

s3.putObject(BUCKET_NAME, fileName, new File(fullFileName));

BUCKET_NAME is the constant with the name of the bucket. fileName: it is not necessary that it be the name of the file, it can be any identifier that we want to use.

Once we upload the audio to the bucket we will create the transcribe job.

    StartTranscriptionJobRequest request = new StartTranscriptionJobRequest();

    request.withLanguageCode(LanguageCode.EsUS);

    Media media = new Media();

    media.setMediaFileUri(s3.getUrl(BUCKET_NAME, fileName).toString());

    request.withMedia(media).withMediaSampleRateHertz(8000);

Review the language options and MediaSampleRateHertz.

Create a name for the job.

String transcriptionJobName = "myJob"; // consider a unique name as an id.

and complete the request and start the job

request.setTranscriptionJobName(transcriptionJobName);
request.withMediaFormat("wav");

client.startTranscriptionJob(request);

In this case a loop to wait for the answer, there are other more efficient options.

GetTranscriptionJobRequest jobRequest = new GetTranscriptionJobRequest();
jobRequest.setTranscriptionJobName(transcriptionJobName);
TranscriptionJob transcriptionJob;

while( true ){
    transcriptionJob = client.getTranscriptionJob(jobRequest).getTranscriptionJob();
    if( transcriptionJob.getTranscriptionJobStatus().equals(TranscriptionJobStatus.COMPLETED.name()) ){

        transcription = this.download( transcriptionJob.getTranscript().getTranscriptFileUri(), fileName);

        break;

    }else if( transcriptionJob.getTranscriptionJobStatus().equals(TranscriptionJobStatus.FAILED.name()) ){

            break;
    }
    // to not be so anxious
    synchronized ( this ) {
        try {
            this.wait(50);
        } catch (InterruptedException e) { }
    }

}

transcriptionJob.getTranscript().getTranscriptFileUri() return a uri to use with any http client either Apache HttpClient or as in my case I prefer JODD (https://jodd.org/http/)

download:

private AmazonTranscription download( String uri, String fileName ){
    HttpResponse response = HttpRequest.get(uri).send();
    String result = response.charset("UTF-8").bodyText();
    // result is a json 
    return gson.fromJson(result, AmazonTranscription.class);
}

AmazonTranscription is a class that I created to contain the json. I share the necessary classes to contain the json parsing, I avoid the set and get to not be so extensive.

public class AmazonTranscription {

    private String jobName;
    private String accountId;
    private Result results;
    private String status;
}

public class Item {

    private String start_time;
    private String end_time;
    private List<Alternative> alternatives = new ArrayList<Alternative>();
    private String type;
}

public class Result {

    private List<Transcript> transcripts = new ArrayList<Transcript>();
    private List<Item>       items       = new ArrayList<Item>();
}

public class Transcript {

    private String transcript;
}

Just add the try / catch where required.

I hope I have not overlooked anything and that it will be useful, it took me some time to understand this Amazon model and I hope to avoid others that time.

Sorry if there are errors in the writing but this is not my native language.

Speech to text by AWS service using Java API

Tags:

amazon-web-services

R.SINGH

1 Answers

Edgardo Genini

Recent Activity

Donate For Us

Speech to text by AWS service using Java API

Tags:

amazon-web-services

R.SINGH

1 Answers

Edgardo Genini

Related questions

Recent Activity

Donate For Us