I'm trying to learn ML.NET/Get into Machine Learning, but I'm stuck at an issue.
My goal is to create a Trained Model that can be used to predict a city based on input.
This code:
var dataPath = "cities.csv";
var mlContext = new MLContext();
var loader = mlContext.Data.CreateTextLoader<CityData>(hasHeader: false, separatorChar: ',');
var data = loader.Load(dataPath);
string featuresColumnName = "Features";
var pipeline = mlContext.Transforms.Concatenate(featuresColumnName, "PostalCode", "CityName")
.Append(mlContext.Clustering.Trainers.KMeans(featuresColumnName, clustersCount: 3));
var model = pipeline.Fit(data);
Which should take an CSV as input (Which contains a list of Cities (Column 0 = Postal Code, Column 1 = CityName), and then add these features to the pipeline, gives the following error:
Unhandled Exception: System.ArgumentOutOfRangeException: Schema mismatch for feature column 'Features': expected Vector<R4>, got Vector<Text>
On the "Fit"- function.
I've done a bit of digging on the GitHub Repo, but I can't seem to find a solution. I'm working from the Iris- example (https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/iris-clustering) (Of course with my modifications)
Any ideas?
Using FeaturizeText to transform strings features into a float array ones
var pipeline = mlContext.Transforms
.Text.FeaturizeText("PostalCodeF", "PostalCode")
.Append(mlContext.Transforms.Text.FeaturizeText("CityNameF", "CityName"))
.Append(mlContext.Transforms.Concatenate(featuresColumnName, "PostalCodeF", "CityNameF"))
.Append(mlContext.Clustering.Trainers.KMeans(featuresColumnName, clustersCount: 3));
var model = pipeline.Fit(data);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With