I was building a test application to authenticate users via Microsoft's Cognitive Speaker Recognition API
. It seems straightforward, but as mentioned in their API Docs, while creating the Enrollment, I need to send the byte[]
of the audio file I record. Now, since I am using Xamarin.Android, I was able to record the audio and save it. Now, the requirements of THAT audio is pretty specific by Microsoft's Cognitive Speaker Recognition API
.
According to the API docs, the audio file format must meet the following requirements.
Container -> WAV
Encoding -> PCM
Rate -> 16K
Sample Format -> 16 bit
Channels -> Mono
Following this recipe I successfully recorded the audio and after playing around a little and with some android docs, I was able to implement these settings as well :
_recorder.SetOutputFormat(OutputFormat.ThreeGpp);
_recorder.SetAudioChannels(1);
_recorder.SetAudioSamplingRate(16);
_recorder.SetAudioEncodingBitRate(16000);
_recorder.SetAudioEncoder((AudioEncoder) Encoding.Pcm16bit);
This meets most of the criteria of the required audio file. But, I cannot seem to save the file in actual ".wav" format and I cannot verify whether the file is actually being PCM
encoded or not.
Here's my AXML and MainActivity.cs : Github Gist
I also followed this code and incorporated it in my code : Github Gist
The file's specs look just fine, but the duration is wrong. No matter how long I record, it just shows 250ms, which results in too-short audio.
Is there any way to do this? Basically I just want to be able to connect to Microsoft's Cognitive Speaker Recognition API
via Xamarin.Android. I couldn't find any such resource to guide myself.
Add the Audio Recorder Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.INTERNET" />
provider
inside the <application></application>
tag.<provider android:name="android.support.v4.content.FileProvider" android:authorities="${applicationId}.fileprovider" android:exported="false" android:grantUriPermissions="true">
<meta-data android:name="android.support.FILE_PROVIDER_PATHS" android:resource="@xml/file_paths"></meta-data>
</provider>
In the Resources folder, create a new folder called xml
Inside of Resources/xml, create a new file called file_paths.xml
file_paths.xml
, add the following code, replacing [your package name] with the package of your Android project<?xml version="1.0" encoding="utf-8"?> <paths xmlns:android="http://schemas.android.com/apk/res/android"> <external-path name="my_images" path="Android/data/[your package name]/files/Pictures"/> <external-path name="my_movies" path="Android/data/[your package name]/files/Movies" /> </paths>
AudioRecorderService AudioRecorder { get; } = new AudioRecorderService
{
StopRecordingOnSilence = true,
PreferredSampleRate = 16000
});
public async Task StartRecording()
{
AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
await AudioRecorder.StartRecording();
}
public async Task StopRecording()
{
AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
await AudioRecorder.StartRecording();
}
async void HandleAudioInputReceived(object sender, string e)
{
AudioRecorder.AudioInputReceived -= HandleAudioInputReceived;
PlaybackRecording();
//replace [UserGuid] with your unique Guid
await EnrollSpeaker(AudioRecorder.GetAudioFileStream(), [UserGuid]);
}
HttpClient Client { get; } = CreateHttpClient(TimeSpan.FromSeconds(10));
public static async Task<EnrollmentStatus?> EnrollSpeaker(Stream audioStream, Guid userGuid)
{
Enrollment response = null;
try
{
var boundryString = "Upload----" + DateTime.Now.ToString("u").Replace(" ", "");
var content = new MultipartFormDataContent(boundryString)
{
{ new StreamContent(audioStream), "enrollmentData", userGuid.ToString("D") + "_" + DateTime.Now.ToString("u") }
};
var requestUrl = "https://westus.api.cognitive.microsoft.com/spid/v1.0/verificationProfiles" + "/" + userGuid.ToString("D") + "/enroll";
var result = await Client.PostAsync(requestUrl, content).ConfigureAwait(false);
string resultStr = await result.Content.ReadAsStringAsync().ConfigureAwait(false);
if (result.StatusCode == HttpStatusCode.OK)
response = JsonConvert.DeserializeObject<Enrollment>(resultStr);
return response?.EnrollmentStatus;
}
catch (Exception)
{
}
return response?.EnrollmentStatus;
}
static HttpClient CreateHttpClient(TimeSpan timeout)
{
HttpClient client = new HttpClient();
client.Timeout = timeout;
client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("gzip"));
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
//replace [Your Speaker Recognition API Key] with your Speaker Recognition API Key from the Azure Portal
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", [Your Speaker Recognition API Key]);
return client;
}
public class Enrollment : EnrollmentBase
{
[JsonConverter(typeof(StringEnumConverter))]
public EnrollmentStatus EnrollmentStatus { get; set; }
public int RemainingEnrollments { get; set; }
public int EnrollmentsCount { get; set; }
public string Phrase { get; set; }
}
public enum EnrollmentStatus
{
Enrolling
Training,
Enrolled
}
Add the SimpleAudioPlayer Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).
public void PlaybackRecording()
{
var isAudioLoaded = Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Load(AudioRecorder.GetAudioFileStream());
if (isAudioLoaded)
Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Play();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With