Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help Needed Creating a Binary Tree Given Truth Table

Tags:

f#

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.

I need to generate tree similar to the following when given a truth table

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

learnedTree is of type BinaryTree which I've defined as follows:

type BinaryTree =
    | Leaf of int
    | Node of int * string * BinaryTree * BinaryTree

ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table

A1 | A2 | A3 | Class
1     0    0      1
0     1    0      1
0     0    0      0
1     0    1      0
0     0    0      0
1     1    0      1
0     1    1      0

And I decide to split on attribute A1 I would end up with the following:

              (A1 = 1)  A1   (A1 = 0)
   A2 | A3 | Class                A2 | A3 | Class
   0     0      1                1      0      1
   0     1      0                0      0      0
   1     0      1                0      0      0
                                 0      1      1

Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

Here is what I've kind of "hacked" together thus far, but I think I might be way off:

let rec createTree (listToSplit : list<list<float>>) index =
    let leftSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
    let rightSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
    if leftSideSplit.Length > 0 then
        let pureCheck = isListPure leftSideSplit
        if pureCheck = 0 then
            printfn "%s" "Pure left node class 0"
            createTree leftSideSplit (index + 1)
        else if pureCheck = 1 then
            printfn "%s" "Pure left node class 1"
            createTree leftSideSplit (index + 1)
        else
            printfn "%s - %A" "Recursing Left" leftSideSplit
            createTree leftSideSplit (index + 1)
    else printfn "%s" "Pure left node class 0"

Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!

like image 621
Jim Martin Avatar asked Sep 13 '09 21:09

Jim Martin


2 Answers

Edit: I've since posted an implementation of ID3 on my blog at: http://blogs.msdn.com/chrsmith

Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.

In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.

type Attribute = string
type Value = string

type Record = 
    {
        Weather : string
        Temperature : string
        PlayTennis : bool 
    }
    override this.ToString() =
        sprintf
            "{Weather = %s, Temp = %s, PlayTennis = %b}" 
            this.Weather 
            this.Temperature 
            this.PlayTennis

type Decision = Attribute * Value

type DecisionTreeNode =
    | Branch of Decision * DecisionTreeNode * DecisionTreeNode
    | Leaf of Record list

// ------------------------------------

// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data = 
    // Just group by weather, then by temperature
    let uniqueWeathers = 
        List.fold 
            (fun acc item -> Set.add item.Weather acc) 
            Set.empty
            data

    let uniqueTemperatures = 
        List.fold
            (fun acc item -> Set.add item.Temperature acc) 
            Set.empty
            data

    if uniqueWeathers.Count = 1 then
        let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
        let left, right = 
            List.partition
                (fun item -> item.Temperature = uniqueTemperatures.MinimumElement) 
                data
        (bestSplit, left, right)
    else
        let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
        let left, right =
            List.partition
                (fun item -> item.Weather = uniqueWeathers.MinimumElement)
                data
        (bestSplit, left, right)

let rec determineBranch data =
    if List.length data < 4 then
        Leaf(data)
    else
        // Use the entropy function to break the dataset on
        // the category / value that best splits the data
        let bestDecision, leftBranch, rightBranch = bestSplit data
        Branch(
            bestDecision, 
            determineBranch leftBranch, 
            determineBranch rightBranch)

// ------------------------------------    

let rec printID3Result indent branch =
    let padding = new System.String(' ', indent)
    match branch with
    | Leaf(data) ->
        data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
    | Branch(decision, lhs, rhs) ->
        printfn "%sBranch predicate [%A]" padding decision
        printfn "%sWhere predicate is true:" padding
        printID3Result (indent + 4) lhs
        printfn "%sWhere predicate is false:" padding
        printID3Result (indent + 4) rhs


// ------------------------------------    

let dataset =
    [
        { Weather = "windy"; Temperature = "hot";  PlayTennis = false }
        { Weather = "windy"; Temperature = "cool"; PlayTennis = false }
        { Weather = "nice";  Temperature = "cool"; PlayTennis = true }
        { Weather = "nice";  Temperature = "cold"; PlayTennis = true }
        { Weather = "humid"; Temperature = "hot";  PlayTennis = false }
    ]

printfn "Given input list:"
dataset |> List.iter (printfn "%A")

printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result
like image 119
Chris Smith Avatar answered Sep 28 '22 09:09

Chris Smith


You can use List.partition instead of your two List.choose calls.

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.Core/Microsoft.FSharp.Collections.List.html

(or now http://msdn.microsoft.com/en-us/library/ee353738(VS.100).aspx )

It isn't clear to me that pattern matching will buy you much here; the input type (list of lists) and processing (partitioning and 'pureness' check) doesn't really lend itself to that.

And of course when you finally get the 'end' (a pure list) you need to create a tree, and then presumably this function will create a Leaf when the input only has one 'side' and it's 'pure', but create a Node out of the left-side and right-side results for every other input. Maybe. I didn't quite grok the algorithm completely.

Hopefully that will help steer you a little bit. May be useful to draw up a few smaller sample inputs and outputs to help work out the various cases of the function body.

like image 38
Brian Avatar answered Sep 28 '22 09:09

Brian