Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connect to Microsoft's Cognitive Speaker Recognition API via Xamarin.Android

I was building a test application to authenticate users via Microsoft's Cognitive Speaker Recognition API. It seems straightforward, but as mentioned in their API Docs, while creating the Enrollment, I need to send the byte[] of the audio file I record. Now, since I am using Xamarin.Android, I was able to record the audio and save it. Now, the requirements of THAT audio is pretty specific by Microsoft's Cognitive Speaker Recognition API.

According to the API docs, the audio file format must meet the following requirements.

Container -> WAV
Encoding -> PCM
Rate -> 16K
Sample Format -> 16 bit
Channels -> Mono

Following this recipe I successfully recorded the audio and after playing around a little and with some android docs, I was able to implement these settings as well :

_recorder.SetOutputFormat(OutputFormat.ThreeGpp);

_recorder.SetAudioChannels(1);
_recorder.SetAudioSamplingRate(16);
_recorder.SetAudioEncodingBitRate(16000);

_recorder.SetAudioEncoder((AudioEncoder) Encoding.Pcm16bit);

This meets most of the criteria of the required audio file. But, I cannot seem to save the file in actual ".wav" format and I cannot verify whether the file is actually being PCM encoded or not.

Here's my AXML and MainActivity.cs : Github Gist

I also followed this code and incorporated it in my code : Github Gist

The file's specs look just fine, but the duration is wrong. No matter how long I record, it just shows 250ms, which results in too-short audio.

Is there any way to do this? Basically I just want to be able to connect to Microsoft's Cognitive Speaker Recognition API via Xamarin.Android. I couldn't find any such resource to guide myself.

like image 922
Xonshiz Avatar asked Mar 15 '18 08:03

Xonshiz


1 Answers

Audio Recording

Add the Audio Recorder Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).

Android Project Configuration

  1. In AndroidMainifest.xml, add the following permissions:
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.INTERNET" />
  1. In AndroidManifest.xml, add the following provider inside the <application></application> tag.
<provider android:name="android.support.v4.content.FileProvider" android:authorities="${applicationId}.fileprovider" android:exported="false" android:grantUriPermissions="true">
    <meta-data android:name="android.support.FILE_PROVIDER_PATHS" android:resource="@xml/file_paths"></meta-data>
</provider>

enter image description here

  1. In the Resources folder, create a new folder called xml

  2. Inside of Resources/xml, create a new file called file_paths.xml

enter image description here

  1. In file_paths.xml, add the following code, replacing [your package name] with the package of your Android project
<?xml version="1.0" encoding="utf-8"?>
<paths xmlns:android="http://schemas.android.com/apk/res/android">
    <external-path name="my_images" path="Android/data/[your package name]/files/Pictures"/>
    <external-path name="my_movies" path="Android/data/[your package name]/files/Movies" />
</paths>

Example Package Name

enter image description here

Android Recorder Code

AudioRecorderService AudioRecorder { get; } = new AudioRecorderService
{
    StopRecordingOnSilence = true,
    PreferredSampleRate = 16000
});

public async Task StartRecording()
{
    AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
    await AudioRecorder.StartRecording();
}

public async Task StopRecording()
{
    AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
    await AudioRecorder.StartRecording();
}

async void HandleAudioInputReceived(object sender, string e)
{
    AudioRecorder.AudioInputReceived -= HandleAudioInputReceived;

    PlaybackRecording();

    //replace [UserGuid] with your unique Guid
    await EnrollSpeaker(AudioRecorder.GetAudioFileStream(), [UserGuid]);
}

Cognitive Services Speaker Recognition Code

HttpClient Client { get; } = CreateHttpClient(TimeSpan.FromSeconds(10));

public static async Task<EnrollmentStatus?> EnrollSpeaker(Stream audioStream, Guid userGuid)
{
    Enrollment response = null;
    try
    {
        var boundryString = "Upload----" + DateTime.Now.ToString("u").Replace(" ", "");
        var content = new MultipartFormDataContent(boundryString)
        {
            { new StreamContent(audioStream), "enrollmentData", userGuid.ToString("D") + "_" + DateTime.Now.ToString("u") }
        };

        var requestUrl = "https://westus.api.cognitive.microsoft.com/spid/v1.0/verificationProfiles" + "/" + userGuid.ToString("D") + "/enroll";
        var result = await Client.PostAsync(requestUrl, content).ConfigureAwait(false);
        string resultStr = await result.Content.ReadAsStringAsync().ConfigureAwait(false);

        if (result.StatusCode == HttpStatusCode.OK)
            response = JsonConvert.DeserializeObject<Enrollment>(resultStr);

        return response?.EnrollmentStatus;
    }
    catch (Exception)
    {

    }

    return response?.EnrollmentStatus;
}

static HttpClient CreateHttpClient(TimeSpan timeout)
{
    HttpClient client = new HttpClient();

    client.Timeout = timeout;
    client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("gzip"));
    client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

    //replace [Your Speaker Recognition API Key] with your Speaker Recognition API Key from the Azure Portal
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", [Your Speaker Recognition API Key]);

    return client;
}

public class Enrollment : EnrollmentBase
{
    [JsonConverter(typeof(StringEnumConverter))]
    public EnrollmentStatus EnrollmentStatus { get; set; }
    public int RemainingEnrollments { get; set; }
    public int EnrollmentsCount { get; set; }
    public string Phrase { get; set; }
}

public enum EnrollmentStatus
{
    Enrolling
    Training,
    Enrolled
}

Audio Playback

Configuration

Add the SimpleAudioPlayer Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).

Code

public void PlaybackRecording()
{
    var isAudioLoaded = Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Load(AudioRecorder.GetAudioFileStream());

    if (isAudioLoaded)
        Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Play();
}
like image 163
Brandon Minnick Avatar answered Nov 08 '22 08:11

Brandon Minnick