Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# load a tree from CSV file

Tags:

tree

f#

I am trying to learn F# and I really like what I have seen so far. I am trying to implement some C# code the F# way of thinking as an exercise to practice and learn.

I really apologize if it has been answered before, but I could not find an answer to solve all my problems.

We have a sales force structure where we have sales supervisors and plain sales people. A supervisor might or might not have a supervisor.

All sales data comes from another system in CSV format. At the time of reading a record, we don't know if that SalesPerson has reports or not.

I don't seem to understand how to load the tree in the immutable world of F#. I am sure there is a way.

Our simplified legacy C# code defines (translated into Enligsh)

public class SalesPerson
{
    public int Id { get; set; }
    public SalesPerson Supervisor { get; set; }
    public List<SalesPerson> Reports { get; private set; } = new List<SalesPerson>();
    public PersonalSales double { get; set; }
    public GroupSales double { get; set; }
}

This is an oversimplified version of the code. However, the problem remains the same: How to load the tree?

I came up with the following F# type

type SalesPerson = {
    Id : int
    Supervisor : SalesPerson option
    Reports : List<SalesPerson> option
    PersonalSales : double
    GroupSales : double
}

I am not even sure if this the F# way of defining the type.

My problems are:

  1. Supervisor points to another SalesPerson and it is immutable. If it gets replaced by a new one (because immutable data works), the reference will break.
  2. Reports is immutable. I think that I could use C#'s List<T> but I am not sure if that is the F# way.
  3. Supervisor's Reports records do not follow the Supervisor record. They might come X number of lines below and not all of them together. However, the system ensures that the Supervisor record will always come before any Reports record for that supervisor.
  4. How to update GroupSales calculated field after the tree is loaded.

A sample CSV file would look like:

1,,100.00
2,,110.00
3,1,50.00
4,1,75.00
5,2,80.00
6,,92.00

So:

1 -> 2 reports
2 -> 1 report
3,4,5,6 -> No reports

I would really appreciate any "light" you might shine on these issues.

Thanks...

like image 263
Crazy Spaniard Avatar asked Sep 04 '17 10:09

Crazy Spaniard


2 Answers

This becomes a bit easier if you separate the tree structure into a separate type. The usual approach to immutable trees is something like this:

let rawData =
    [ 1, None, 100.00
      2, None, 110.00
      3, Some 1, 50.00
      4, Some 1, 75.00
      5, Some 2, 80.00
      6, None, 92.00 ]

let dataMap = rawData |> List.groupBy (fun (_, superId, _) -> superId) |> Map
let getChildrenData personId = dataMap |> Map.tryFind personId |> Option.defaultValue []

type Tree<'a> = { Data: 'a; Children : List<Tree<'a>> }

type SalesPerson = { Id : int; SupervisorId : int option; PersonalSales : double; GroupSales : double }

let salesPersonTree =
    let rec buildNode (id, superId, sales) =
        let children = getChildrenData (Some id) |> List.map buildNode
        let groupSales = (children |> List.sumBy (fun x -> x.Data.GroupSales)) + sales
        { Data = { Id = id; SupervisorId = superId; PersonalSales = sales; GroupSales = groupSales }
          Children = children }

    let topLevelItems = getChildrenData None
    topLevelItems |> List.map buildNode

In summary: group the data by parent and then use a recursive function build up the tree starting with the top nodes (the ones that have no parent). Because we have built all descendant nodes we finish building any given node, we can use the descendant data to calculate GroupSales.

You can't access the parent directly from a given node, but you do have the parent ID. As long as you keep the original salesPeople list you can get the full data for any given parent ID.

One advantage to having a generic Tree type is that you can have re-usable functions (e.g. map, fold, tryFind) that work on any tree.

like image 146
TheQuickBrownFox Avatar answered Nov 19 '22 17:11

TheQuickBrownFox


@TheQuickBrownFox did a good job modeling your domain.

type Employee = { Id : int; SupervisorId : int option; PersonalSales : double }

Using a record / class to represent a Tree is an OO way of handling things, which might be easier to grasp when you don't have lot of experience in FP.

I want to show you a more functional approach.

type 'a Tree =
  | Leaf of 'a
  | Branch of 'a * 'a Tree list

The Leaf nodes are the SalesPersons at the end of the hierarchy. The Supervisors and all their minions are represented by Branches and go all the way up.

type SalesMember =
  | SalesPerson of Employee
  | Supervisor of Employee * SalesMember List

A Tree would also have a root node -- there can be only one -- you could easily write a function to transform rawData to something like:

let rawData =
    [ 0, None, 0.0
      1, Some 0, 100.00
      2, Some 0, 110.00
      3, Some 1, 50.00
      4, Some 1, 75.00
      5, Some 2, 80.00
      6, Some 0, 92.00 ]

let flatList =
    rawData
    |> List.map (fun (id, superId, sales) -> 
                     {Id = id; SupervisorId = superId; PersonalSales = sales})

let getTree salesPeople =
    // To do : validate root
    let root = salesPeople |> List.find (fun p -> p.SupervisorId = None)
    let children supervisorId = 
        salesPeople |> List.filter (fun p -> p.SupervisorId = Some supervisorId)

    let rec loop employee = 
      match children employee.Id with
      | [] -> SalesPerson employee
      | list -> Supervisor (employee, List.map loop list)

    loop root

let salesForce = getTree flatList 

To implement GroupSales you could expand Supervisor.

type SalesMember =
  | SalesPerson of emp : Employee
  | Supervisor of emp : Employee * reports : List<SalesMember> * groupSales : double

One way of building an instance of this tree is by transforming the tree from the getTree function. Processing, transforming and optimizing trees is a broad topic, as always for fun and profit is a good place to start your journey.

UPDATE - GroupSales

To keep it simple I'll use only one Discriminated Union, setting GroupSales to zero on the first run. You could, however, easily adapt the code to transform to another type of Tree.

type Employee = { Id : int; SupervisorId : int option; PersonalSales : double }

type GroupSales = double
type SalesMember =
  | SalesPerson of Employee
  | Supervisor of Employee * SalesMember List * GroupSales

let rawData =
    [ 0, None, 0.
      1, Some 0, 100.00
      2, Some 0, 110.00
      3, Some 1, 50.00
      4, Some 1, 75.00
      5, Some 2, 80.00
      6, Some 0, 92.00 ]

let flatList =
    rawData
    |> List.map (fun (id, superId, sales) -> 
                     {Id = id; SupervisorId = superId; PersonalSales = sales})

let getTree salesPeople =
    let root = salesPeople |> List.find (fun p -> p.SupervisorId = None)
    let children supervisorId = 
        salesPeople |> List.filter (fun p -> p.SupervisorId = Some supervisorId)

    let rec loop employee = 
        match children employee.Id with
          | [] -> SalesPerson employee
          | list -> Supervisor (employee, List.map loop list, 0.)

    loop root

let transformTree root =
    let rec getGroupSales = function
      | SalesPerson emp 
        -> emp.PersonalSales
      | Supervisor (sup, reports, _) 
        -> sup.PersonalSales + List.sumBy getGroupSales reports

    let rec loop = function
      | Supervisor (sup, reports, _) as mem
        -> Supervisor (sup, List.map loop reports, getGroupSales mem)
      | salesPerson -> salesPerson

    loop root

let salesForce = 
    flatList 
    |> getTree
    |> transformTree

A less naive implementation would transform / calc GroupSales bottom-up instead of top-down, allowing you to use already calculated GroupSales.

like image 2
Funk Avatar answered Nov 19 '22 18:11

Funk