I am trying to write a C# wrapper method to make it easier for me to create, train and use an ML.NET Classification model WITHOUT having to hard-code a class containing my predictor variables and target variable. I have looked at all the examples and ML.NET documentation I could find but could not find a complete example from reading data to using the model.
Below is the method I have in mind. You will note that the code for variables "trainingDataView" and "dataProcessPipeline" is incomplete. This is code I have tried all day using various approaches but to no avail. I keep getting an error at the crossvalidate stage telling me that my target column was not found.
public static ITransformer CreateClassificationModelExample(MLContext mlContext, DataTable data, List<string> featureColumns, String targetColumn)
{
//I am stuck here. Ideally I would like to see a code snippet to create a IDataView from the DataTable passed in as parameter
//and then selecting only the columns in parameter 'featureColumns' and target = parameter 'targetColumn'
var trainingDataView = ????;
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey(targetColumn, targetColumn)
.Append(mlContext.Transforms.Categorical.OneHotEncoding(ValToKeys))
.Append(mlContext.Transforms.Concatenate("Features", featureSet))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: targetColumn, featureColumnName: "Features")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
var trainingPipeline = dataProcessPipeline.Append(trainer);
// Evaluate quality of Model
var crossValidationResults = mlContext.MulticlassClassification.CrossValidate(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: targetColumn);
// Train Model
ITransformer model = trainingPipeline.Fit(trainingDataView);
return model;
}
I have thoroughly explored the ML.NET documentation, including the LoadFromEnumerable method example. Also I looked at the ML.NET blog and cookbook discussions on this topic.
PLEASE if someone can help with a code snippet to make the above method work I am sure that would help many others also! Thanks!
Well, after one more day of effort I got close though not yet completely free of compile time modifications. The code below shows a Wrapper that more or less does what I want, although it does require that the NUMBER of model features are known at compile time, which is better but far from ideal.
In the example below, I create an IDataView from a DataTable using only specific columns for predictors/features, and a specific column as a Target for the classification model. The code then sets up a trains a classification model (example shows "LbfgsMaximumEntropy" model), evaluates it using cross-validation and then trains it. I also show some code on how to create a prediction engine and make a prediction. NOTE THAT this code assumes you have 10 predictor/feature variables. But that 10 is easy to change (2 lines in class "Observation" shown below) - much easier than writing a class each time you want to use a new data table to predict from.
Here is the code. It is a bit old style as I do not use Lambda Expressions:
public static ITransformer CreateClassificationModel(MLContext mlContext, DataTable data, List<string> predictorColumns, String TargetColumn, Dictionary<string, int> TargetMapper)
{
//Create instances of the GENERIC class Observation and set the values from the DataTable
//using only the required predictor columns and the target column
List<Observation> observations = new List<Observation>();
int iRow = 0;
foreach (DataRow row in data.Rows)
{
var obs = new Observation();
int iFeature = 0;
foreach (string predictorColumn in predictorColumns)
{
obs.Features[iFeature] = Convert.ToSingle(row[predictorColumn]);
iFeature++;
}
obs.Target = TargetMapper[row[TargetColumn].ToString()];
observations.Add(obs);
iRow++;
}
IEnumerable<Observation> dataNew = observations;
var definedSchema = SchemaDefinition.Create(typeof(Observation));
// Read the data into an IDataView with the modified schema supplied in
IDataView trainingDataView = mlContext.Data.LoadFromEnumerable(observations, definedSchema);
var featureSet = new String[1];
featureSet[0] = "Features";
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Target", "Target")
.Append(mlContext.Transforms.Concatenate("Features", featureSet))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "Target", featureColumnName: "Features")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
IEstimator<ITransformer> trainingPipeline = trainingPipeline = dataProcessPipeline.Append(trainer);
// Evaluate quality of Model
var crossValidationResults = mlContext.MulticlassClassification.CrossValidate(trainingDataView, trainingPipeline, numberOfFolds: 5, labelColumnName: "Target");
// Train Model
ITransformer model = trainingPipeline.Fit(trainingDataView);
return model;
}
To test/use this model, the following PredictionEngine can be used (snippet):
List<Observation> testData = GetTestDataList(); //Get some test data as Observations
// Create a prediction engine from the model for feeding new data.
var engine = mlContext.Model.CreatePredictionEngine<Observation, ModelOutput>(model);
//Make a prediction. The result is of type Output, class shown below.
var output = engine.Predict(testData[0]);
And finally, below are the definitions for the two classes needed in the above code:
public class Observation
{
private float[] m_Features = new Single[10];
[VectorType(10)]
public float[] Features
{
get
{
return m_Features;
}
}
public int Target { get; set; }
}
public class ModelOutput
{
// ColumnName attribute is used to change the column name from
// its default value, which is the name of the field.
[ColumnName("PredictedLabel")]
public Int32 Prediction { get; set; }
public float[] Score { get; set; }
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With