Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ML.Net 0.7 - Get Scores and Labels for MulticlassClassification

Tags:

c#

ml.net

I'm using ML.NET 0.7 and have a MulticlassClassification model with the following result class:

public class TestClassOut
{
  public string Id { get; set; }
  public float[] Score { get; set; }
  public string PredictedLabel { get; set; }
}

I'd like to know the scores and the corresponding labels on the Scores property. Feels like I should be able to make the property a Tuple<string,float> or similar to get the label that the score represents.

I understand that there was a method on V0.5:

model.TryGetScoreLabelNames(out scoreLabels);

But can't seem to find the equivalent in V0.7.

Can this be done? if so how?

like image 722
jondow Avatar asked Nov 12 '18 16:11

jondow


2 Answers

This is probably not the answer you're looking for, but I ended up copying the code from TryGetScoreLabelNames (it's in the Legacy namespace as of 0.7) and tweaking it to use the schema from my input data. The dataView below is an IDataView I created from my prediction input data so I could get the schema off of it.

public bool TryGetScoreLabelNames(out string[] names, string scoreColumnName = DefaultColumnNames.Score)
{
    names = (string[])null;
    Schema outputSchema = model.GetOutputSchema(dataView.Schema);
    int col = -1;
    if (!outputSchema.TryGetColumnIndex(scoreColumnName, out col))
        return false;
    int valueCount = outputSchema.GetColumnType(col).ValueCount;
    if (!outputSchema.HasSlotNames(col, valueCount))
        return false;
    VBuffer<ReadOnlyMemory<char>> vbuffer = new VBuffer<ReadOnlyMemory<char>>();
    outputSchema.GetMetadata<VBuffer<ReadOnlyMemory<char>>>("SlotNames", col, ref vbuffer);
    if (vbuffer.Length != valueCount)
        return false;
    names = new string[valueCount];
    int num = 0;
    foreach (ReadOnlyMemory<char> denseValue in vbuffer.DenseValues())
        names[num++] = denseValue.ToString();
    return true;
}

I also asked this question in gitter for ml.net (https://gitter.im/dotnet/mlnet) and got this response from Zruty0

my best suggestion is to convert labels to 0..(N-1) beforehand, then train, and then inspect the resulting 'Score' column. It'll be a vector of size N, with per-class scores. PredictedLabel is actually just argmax(Score), and you can get the 2nd and other candidates by sorting Score

If you have a static set of classes this might be a better option, but my situation has an ever-growing set of classes.

like image 196
takvor Avatar answered Oct 06 '22 01:10

takvor


This was asked a while ago, but I think that this is still a very relevant question that surprisingly has not got a lot of traction and isn't mentioned (as of the time of writing) in any of the Microsoft ML.NET tutorials. The sample code above needs a bit of tweaking to get it to work with v1.5 (preview), so I thought I'd post how I got it working for anyone else who stumbles across this.

In ConsumeModel.cs (assuming you're using the Model Builder in Visual Studio):

...
            // Use model to make prediction on input data
            ModelOutput result = predEngine.Predict(input);
            var labelNames = new List<string>();

            var column = predEngine.OutputSchema.GetColumnOrNull("label");
            if (column.HasValue)
            {
                VBuffer<ReadOnlyMemory<char>> vbuffer = new VBuffer<ReadOnlyMemory<char>>();
                column.Value.GetKeyValues(ref vbuffer);

                foreach (ReadOnlyMemory<char> denseValue in vbuffer.DenseValues())
                    labelNames.Add(denseValue.ToString());
            }
...

The end result that labelNames is now a parallel collection to result.Score. Just keep in mind that changes to the generated files could get overwritten if you rebuild the model using Model Builder.

like image 29
Jason Sultana Avatar answered Oct 06 '22 00:10

Jason Sultana