Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to predict integer values using ML.NET?

I'm looking at a the cs file here: https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet/get-started/windows and everything works well.

Now I'd like to improve the example: I'd like to predict a number-only data set and not a number-string dataset, for example predict the ouput of a seven segments display.

Here is my super easy dataset, the last column is the int number that I want to predict:

1,0,1,1,1,1,1,0
0,0,0,0,0,1,1,1
1,1,1,0,1,1,0,2
1,1,1,0,0,1,1,3
0,1,0,1,0,1,1,4
1,1,1,1,0,0,1,5
1,1,1,1,1,0,1,6
1,0,0,0,0,1,1,7
1,1,1,1,1,1,1,8
1,1,1,1,0,1,1,9

And here is my test code:

public class Digit
{
    [Column("0")] public float Up;

    [Column("1")] public float Middle;

    [Column("2")] public float Bottom;

    [Column("3")] public float UpLeft;
    [Column("4")] public float BottomLeft;
    [Column("5")] public float TopRight;
    [Column("6")] public float BottomRight;

    [Column("7")] [ColumnName("DigitValue")]
    public float DigitValue;
}

public class DigitPrediction
{
    [ColumnName("PredictedDigits")] public float PredictedDigits;
}

public PredictDigit()
{
    var pipeline = new LearningPipeline();
    var dataPath = Path.Combine("Segmenti", "segments.txt");
    pipeline.Add(new TextLoader<Digit>(dataPath, false, ","));
    pipeline.Add(new ColumnConcatenator("Label", "DigitValue"));
    pipeline.Add(new ColumnConcatenator("Features", "Up", "Middle", "Bottom", "UpLeft", "BottomLeft", "TopRight", "BottomRight"));
    pipeline.Add(new StochasticDualCoordinateAscentClassifier());
    var model = pipeline.Train<Digit, DigitPrediction>();
    var prediction = model.Predict(new Digit
    {
        Up = 1,
        Middle = 1,
        Bottom = 1,
        UpLeft = 1,
        BottomLeft = 1,
        TopRight = 1,
        BottomRight = 1,
    });

    Console.WriteLine($"Predicted digit is: {prediction.PredictedDigits}");
    Console.ReadLine();
}

As you can see it is very similar to the example provided except the last column ("Label") handling beacause I need to predict a number and not a string. I try with:

pipeline.Add(new ColumnConcatenator("Label", "DigitValue"));

but it does not work, exception:

Training label column 'Label' type is not valid for multi-class: Vec<R4, 1>. Type must be R4 or R8.

I'm sure I miss something but actually I cannot find anything on internet that can help me solve this problem.

UPDATE

I found that the dataset have to have a Label column like this:

[Column("7")] [ColumnName("Label")] public float Label;

and the DigitPrediction a Score column like:

public class DigitPrediction
{
    [ColumnName("Score")] public float[] Score;
}

Now the system "works" and I got as prediction.Score a Single[] value where the index associated with the higher value is the predicted value. Is it the right approach?

UPDATE 2 - Complete code example

Following the answer and other suggestions I got the right result, if you need it you can find complete code here.

like image 395
Rowandish Avatar asked May 23 '18 21:05

Rowandish


People also ask

What are ML predictions?

What does Prediction mean in Machine Learning? “Prediction” refers to the output of an algorithm after it has been trained on a historical dataset and applied to new data when forecasting the likelihood of a particular outcome, such as whether or not a customer will churn in 30 days.

What ML.NET can do?

ML.NET allows you to train, build, and ship custom machine learning models using C# or F# for a variety of ML scenarios. ML.NET includes features like automated machine learning (AutoML) and tools like ML.NET CLI and ML.NET Model Builder, which make integrating machine learning into your applications even easier.

Is ML.NET fast?

ML.NET can evaluate deep learning models with a decent speed and is faster than PyTorch using CPU. It can be a dealbreaker for production use.


2 Answers

Looks like you would need to add this field to your class DigitPrediction:

public class DigitPrediction
{
    [ColumnName("PredicatedLabel")]
    public uuint ExpectedDigit; // <-- This is the predicted value

    [ColumnName("Score")]
    public float[] Score; // <-- This is the probability that the predicted value is the right classification
}

And I think you would need to change the line where it writes the result to:

    Console.WriteLine($"Predicted digit is: {prediction.ExpectedDigit}");

One more thing, it looks like there is a bug in the API where the expected digit will be off by one but if you shift it by adding +1 to the predicted value it will be the correct value. I expect them to fix this in the future, there is an issue for it: (https://github.com/dotnet/machinelearning/issues/235)

like image 60
Kyle B Avatar answered Oct 05 '22 23:10

Kyle B


You could also use try swapping ColumnConcatenator with ColumnCopier, in the Pipeline, for the Label column.

pipeline.Add(new ColumnCopier ("Label", "DigitValue"));

That will indicate the pipeline which column is the Label, but the output of ColumnCopier will not be a vector, unlike the output of ColumnConcatenator.

And could similarly add a Score column as well.

like image 42
amy8374 Avatar answered Oct 06 '22 00:10

amy8374