I have this trainingdata in a csv:
RoomType, HasWater
bathroom, true
living_room, false
storage, false
kitchen, true
...
Using golearn to train data. Decission tree (not sure if decission tree is the correct one?)
func TrainData() {
...
//Do a training-test split
trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50)
tree := trees.NewID3DecisionTree(0.6)
...
tree.Save('decissiontree.h')
}
So I have saved the result to this binary file.
I want be able to query my file with a label (a roomType) of the probability the room has water or not.
func HasWater(roomType: string):float64 {
tree := trees.NewID3DecisionTree(0.6)
tree.Load("model.h");
// What next??
}
I don't find any examples in golearn how you actually use your trained binary file. I guess I should load the file. But then what?
Sorry for basic question. Totally new to ML (and GO).
You have a training.csv
which you used for training and testing if your model is working as expected. Once your model is working as expected you saved it for later use.
training.csv
RoomType, HasWater
bathroom, true
living_room, false
storage, false
kitchen, true
...
// Reading training.csv file
testData, err := base.ParseCSVToInstances("training.csv", false)
if err != nil {
panic(err)
}
// spliting train and test data
trainData, testData := base.InstancesTrainTestSplit(testData, 0.50)
// used the trainData for training model
...
// testing if model is working as expected
predictions, err := tree.Predict(testData)
...
// Saved model for later work
tree.Save('decissiontree.h')
Now the model is trained and working as expected. we need to test model if it is working on unknown data different from training and test data you can load the previous saved trained binary and use that for testing unknown data for prediction
unknown.csv
RoomType, HasWater
ocean, true
truck, false
car, false
lake, true
...
tree := &DecisionTreeNode{}
err := tree.Load("model.h");
unknownTestData, err := base.ParseCSVToInstances("unknown.csv", false)
if err != nil {
panic(err)
}
predictions, err := tree.Predict(unknownTestData)
// Create a new, empty DenseInstances
newInst := base.NewDenseInstances()
// Create some Attributes
attrs := make([]base.Attribute, 2)
attrs[0] = new(base.CategoricalAttribute)
attrs[0].SetName("0")
attrs[1] = new(base.CategoricalAttribute)
attrs[1].SetName("1")
// Add the attributes
newSpecs := make([]base.AttributeSpec, len(attrs))
newSpecs[0] = newInst.AddAttribute(attrs[0])
newSpecs[1] = newInst.AddAttribute(attrs[1])
newInst.Extend(4)
newInst.Set(newSpecs[0], 0, newSpecs[0].GetAttribute().GetSysValFromString(strings.TrimSpace("RoomType")))
newInst.Set(newSpecs[1], 0, newSpecs[1].GetAttribute().GetSysValFromString(strings.TrimSpace("HasWater")))
newInst.Set(newSpecs[0], 1, newSpecs[0].GetAttribute().GetSysValFromString(strings.TrimSpace("ocean")))
newInst.Set(newSpecs[1], 1, newSpecs[1].GetAttribute().GetSysValFromString(strings.TrimSpace("true")))
newInst.Set(newSpecs[0], 2, newSpecs[0].GetAttribute().GetSysValFromString(strings.TrimSpace("truck")))
newInst.Set(newSpecs[1], 2, newSpecs[1].GetAttribute().GetSysValFromString(strings.TrimSpace("false")))
newInst.Set(newSpecs[0], 3, newSpecs[0].GetAttribute().GetSysValFromString(strings.TrimSpace("lake")))
newInst.Set(newSpecs[1], 3, newSpecs[1].GetAttribute().GetSysValFromString(strings.TrimSpace("true")))
predictions, err := tree.Predict(newInst)
Here i only used CategoricalAttribute
but there are more Attributes type available in golearn like FloatAttribute
, BinaryAttribute
Note: unkwown.csv
includes data which is not included in training or test data which can be used to check how well model is performing on unknown data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With