I'm having a problem when training a model. I have a range of HTTP requests and I want to be able to identify is the request is coming from a bot or not. To train this I have a range of these:
public class Request
{
public string Url { get; set; }
public string UserAgent { get; set; }
public bool IsBot { get; set; }
}
And a prediction class like this:
public class IsBotPrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction { get; set; }
public float Score { get; set; }
}
Just for this example, I have created a list of hardcoded data:
var trainingData = new List<Request>
{
new Request { Url = "/wp-admin", UserAgent = "a bot", IsBot = true },
new Request { Url = "/backoffice", UserAgent = "a bot", IsBot = true },
new Request { Url = "/hack", UserAgent = "a bot", IsBot = true },
new Request { Url = "/login", UserAgent = "a bot", IsBot = false },
new Request { Url = "/dashboard", UserAgent = "a bot", IsBot = false },
new Request { Url = "/humans.txt", UserAgent = "a bot", IsBot = false },
new Request { Url = "/admin", UserAgent = "a bot", IsBot = true },
};
To train a model I'm using the following code:
IDataView mlData = mlContext.Data.LoadFromEnumerable(trainingData);
var dataPrepPipeline = mlContext
.Transforms
.Text
.FeaturizeText("UrlF", "Url")
.Append(mlContext.Transforms.Text.FeaturizeText("UserAgentF", "UserAgent"))
.Append(mlContext.Transforms.Concatenate("Features", "UrlF", "UserAgentF"))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
var prepPipeline = dataPrepPipeline.Fit(mlData);
var trainer = mlContext
.BinaryClassification
.Trainers
.AveragedPerceptron(labelColumnName: "IsBot", numberOfIterations: 10, featureColumnName: "Features");
var preprocessedData = prepPipeline.Transform(mlData);
ITransformer trainedModel = trainer.Fit(preprocessedData);
The trained model seems to be a success. But when I try to create a prediction engine:
var predEngine = mlContext.Model.CreatePredictionEngine<Request, IsBotPrediction>(trainedModel);
I get the following exception:
System.ArgumentOutOfRangeException: 'Features column 'Feature' not found (Parameter 'schema')'
Can you please help me figure out what this means?
This may be due to transforming the data before it gets fitted into the model.
The below setup should work.
var dataPrepPipeline = mlContext.Transforms.Text.FeaturizeText("UrlF", "Url")
.Append(mlContext.Transforms.Text.FeaturizeText("UserAgentF", "UserAgent"))
.Append(mlContext.Transforms.Concatenate("Features", "UrlF", "UserAgentF"))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
var dataPrepModel = dataPrepPipeline.Fit(mlData);
var dataPrepDataView = dataPrepModel.Transform(mlData);
var pipeline = dataPrepPipeline.Append(
mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "IsBot", numberOfIterations: 10, featureColumnName: "Features"));
mlContext.Model.Save(dataPrepModel, dataPrepDataView.Schema, "./dataprep.zip");
var model = pipeline.Fit(mlData);
var modelDataView = model.Transform(mlData);
mlContext.Model.Save(model, modelDataView.Schema, "./model.zip");
var predEngine = mlContext.Model.CreatePredictionEngine<Request, IsBotPrediction>(model);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With