Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solving knapsack prob in F#: performance

I found an article:
Solving the 0-1 knapsack problem using continuation-passing style with memoization in F#

about knapsack problem implemented in F#. As I'm learning this language, I found this really interesting and tried to investigate this a bit. Here's the code I crafted:

open System
open System.IO 
open System.Collections.Generic

let parseToTuple (line : string) =
    let parsedLine = line.Split(' ') |> Array.filter(not << String.IsNullOrWhiteSpace)         |> Array.map Int32.Parse
    (parsedLine.[0], parsedLine.[1])

let memoize f =
    let cache = Dictionary<_, _>()
    fun x ->
        if cache.ContainsKey(x)
            then cache.[x]
        else
            let res = f x
            cache.[x] <- res
            res

type Item =
    {
        Value : int
        Size  : int
    }  

type ContinuationBuilder() = 
    member b.Bind(x, f) = fun k -> x (fun x -> f x k)
    member b.Return x = fun k ->  k x
    member b.ReturnFrom x = x

let cont = ContinuationBuilder()

let set1 =
    [
        (4, 11)
        (8, 4)
        (10, 5)
        (15, 8)
        (4, 3)
    ]

let set2 =
    [
        (50, 341045); (1906, 4912); (41516, 99732); (23527, 56554); (559, 1818); (45136, 108372); (2625, 6750); (492, 1484)
        (1086, 3072); (5516, 13532); (4875, 12050); (7570, 18440); (4436, 10972); (620, 1940); (50897, 122094); (2129, 5558)
        (4265, 10630); (706, 2112); (2721, 6942); (16494, 39888); (29688, 71276); (3383, 8466); (2181, 5662); (96601, 231302)
        (1795, 4690); (7512, 18324); (1242, 3384); (2889, 7278); (2133, 5566); (103, 706); (4446, 10992); (11326, 27552)
        (3024, 7548); (217, 934); (13269, 32038); (281, 1062); (77174, 184848); (952, 2604); (15572, 37644); (566, 1832)
        (4103, 10306); (313, 1126); (14393, 34886); (1313, 3526); (348, 1196); (419, 1338); (246, 992); (445, 1390)
        (23552, 56804); (23552, 56804); (67, 634)
    ]

[<EntryPoint>] 
let main args =
    // prepare list of items from a file args.[0]
    let header, items = set1
                        |> function
                           | h::t -> h, t
                           | _    -> raise (Exception("Wrong data format"))

    let N, K = header
    printfn "N = %d, K = %d" N K
    let items = List.map (fun x -> {Value = fst x ; Size = snd x}) items |> Array.ofList

    let rec combinations =
        let innerSolver key =
            cont
                {
                    match key with
                    | (i, k) when i = 0 || k = 0        -> return 0
                    | (i, k) when items.[i-1].Size > k  -> return! combinations (i-1, k)
                    | (i, k)                            -> let item = items.[i-1]
                                                           let! v1 = combinations (i-1, k)
                                                           let! beforeItem = combinations (i-1, k-item.Size)
                                                           let v2 = beforeItem + item.Value
                                                           return max v1 v2
                }
        memoize innerSolver

    let res = combinations (N, K) id
    printfn "%d" res
    0

However, the problem with this implementation is that it's veeeery slow (in practice I'm unable to solve problem with 50 items and capacity of ~300000, which gets solved by my naive implementation in C# in less than 1s).

Could you tell me if I made a mistake somewhere? Or maybe the implementation is correct and this is simply the inefficient way of solving this problem.

like image 641
Alojzy Leszcz Avatar asked Jul 03 '13 10:07

Alojzy Leszcz


People also ask

What is the formula for knapsack problem?

The maximum value when selected in n packages with the weight limit M is B[n][M]. In other words: When there are i packages to choose, B[i][j] is the optimal weight when the maximum weight of the knapsack is j. The optimal weight is always less than or equal to the maximum weight: B[i][j] ≤ j.

Which method solves knapsack problem?

Which of the following methods can be used to solve the Knapsack problem? Explanation: Brute force, Recursion and Dynamic Programming can be used to solve the knapsack problem.

Is knapsack NP or P?

Theorem 1 Knapsack is NP-complete. Proof: First of all, Knapsack is NP. The proof is the set S of items that are chosen and the verification process is to compute ∑i∈S si and ∑i∈S vi, which takes polynomial time in the size of input.


2 Answers

When you naively apply a generic memoizer like this, and use continuation passing, the values in your memoization cache are continuations, not regular "final" results. Thus, when you get a cache hit, you aren't getting back a finalized result, you are getting back some function which promises to compute a result when you invoke it. This invocation might be expensive, might invoke various other continuations, might ultimately hit the memoization cache again itself, etc.

Effectively memoizing continuation-passing functions such that a) the caching works to full effect and b) the function remains tail-recursive is quite difficult. Read this discussion and come back when you fully understand it all. ;-)

The author of the blog post you linked is using a more sophisticated, less generic memoizer which is specially fitted to the problem. Admittedly, I don't fully grok it yet (code on the blog is incomplete/broken, so hard to try it out), but I think the gist of it is that it "forces" the chain of continuations before caching the final integer result.

To illustrate the point, here's a quick refactor of your code which is fully self-contained and traces out relevant info:

open System
open System.Collections.Generic

let mutable cacheHits = 0
let mutable cacheMisses = 0

let memoize f =
    let cache = Dictionary<_, _>()
    fun x ->
        match cache.TryGetValue(x) with
        | (true, v) -> 
            cacheHits <- cacheHits + 1
            printfn "Hit for %A - Result is %A" x v
            v
        | _ ->
            cacheMisses <- cacheMisses + 1
            printfn "Miss for %A" x
            let res = f x
            cache.[x] <- res
            res

type Item = { Value : int; Size  : int }  

type ContinuationBuilder() = 
    member b.Bind(x, f) = fun k -> x (fun x -> f x k)
    member b.Return x = fun k ->  k x
    member b.ReturnFrom x = x

let cont = ContinuationBuilder()

let genItems n = 
   [| for i = 1 to n do
         let size = i % 5
         let value = (size * i)
         yield { Value = value; Size = size }
   |]

let N, K = (5, 100)
printfn "N = %d, K = %d" N K

let items = genItems N

let rec combinations_cont =
    memoize (
     fun key ->
       cont {
                match key with
                | (0, _) | (_, 0)                   -> return 0
                | (i, k) when items.[i-1].Size > k  -> return! combinations_cont (i - 1, k) 
                | (i, k)                            -> let item = items.[i-1]
                                                       let! v1 = combinations_cont (i-1, k)
                                                       let! beforeItem = combinations_cont (i-1, k - item.Size)
                                                       let v2 = beforeItem + item.Value
                                                       return max v1 v2
        }
    )

let res = combinations_cont (N, K) id
printfn "Answer: %d" res
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""

let rec combinations_plain =
    memoize (
     fun key ->
                match key with
                | (i, k) when i = 0 || k = 0        -> 0
                | (i, k) when items.[i-1].Size > k  -> combinations_plain (i-1, k) 
                | (i, k)                            -> let item = items.[i-1]
                                                       let v1 = combinations_plain (i-1, k)
                                                       let beforeItem = combinations_plain (i-1, k-item.Size)
                                                       let v2 = beforeItem + item.Value
                                                       max v1 v2
    )

cacheHits <- 0
cacheMisses <- 0

let res2 = combinations_plain (N, K)
printfn "Answer: %d" res2
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses

As you can see, the CPS version is caching continuations (not integers), and there are is a lot of extra activity going on toward the end as the continuations are invoked.

If you boost the problem size to let (N, K) = (20, 100) (and remove the printfn statements in the memoizer), you will see that the CPS version ends up doing over 1 million cache lookups, compared to plain version doing only a few hundred.

like image 153
latkin Avatar answered Sep 17 '22 08:09

latkin


From running this code in FSI:

open System
open System.Diagnostics
open System.Collections.Generic

let time f =
    System.GC.Collect()
    let sw = Stopwatch.StartNew()
    let r = f()
    sw.Stop()
    printfn "Took: %f" sw.Elapsed.TotalMilliseconds
    r

let mutable cacheHits = 0
let mutable cacheMisses = 0

let memoize f =
    let cache = Dictionary<_, _>()
    fun x ->
        match cache.TryGetValue(x) with
        | (true, v) -> 
            cacheHits <- cacheHits + 1
            //printfn "Hit for %A - Result is %A" x v
            v
        | _ ->
            cacheMisses <- cacheMisses + 1
            //printfn "Miss for %A" x
            let res = f x
            cache.[x] <- res
            res

type Item = { Value : int; Size  : int }  

type ContinuationBuilder() = 
    member b.Bind(x, f) = fun k -> x (fun x -> f x k)
    member b.Return x = fun k ->  k x
    member b.ReturnFrom x = x

let cont = ContinuationBuilder()

let genItems n = 
    [| for i = 1 to n do
            let size = i % 5
            let value = (size * i)
            yield { Value = value; Size = size }
    |]

let N, K = (80, 400)
printfn "N = %d, K = %d" N K

let items = genItems N

//let rec combinations_cont =
//    memoize (
//     fun key ->
//       cont {
//                match key with
//                | (0, _) | (_, 0)                   -> return 0
//                | (i, k) when items.[i-1].Size > k  -> return! combinations_cont (i - 1, k) 
//                | (i, k)                            -> let item = items.[i-1]
//                                                       let! v1 = combinations_cont (i-1, k)
//                                                       let! beforeItem = combinations_cont (i-1, k - item.Size)
//                                                       let v2 = beforeItem + item.Value
//                                                       return max v1 v2
//        }
//    )
//
//
//cacheHits <- 0
//cacheMisses <- 0

//let res = time(fun () -> combinations_cont (N, K) id)
//printfn "Answer: %d" res
//printfn "Memo hits: %d" cacheHits
//printfn "Memo misses: %d" cacheMisses
//printfn ""

let rec combinations_plain =
    memoize (
        fun key ->
                match key with
                | (i, k) when i = 0 || k = 0        -> 0
                | (i, k) when items.[i-1].Size > k  -> combinations_plain (i-1, k) 
                | (i, k)                            -> let item = items.[i-1]
                                                       let v1 = combinations_plain (i-1, k)
                                                       let beforeItem = combinations_plain (i-1, k-item.Size)
                                                       let v2 = beforeItem + item.Value
                                                       max v1 v2
    )

cacheHits <- 0
cacheMisses <- 0

printfn "combinations_plain"
let res2 = time (fun () -> combinations_plain (N, K))
printfn "Answer: %d" res2
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""

let recursivelyMemoize f =
    let cache = Dictionary<_, _>()
    let rec memoizeAux x =
        match cache.TryGetValue(x) with
        | (true, v) -> 
            cacheHits <- cacheHits + 1
            //printfn "Hit for %A - Result is %A" x v
            v
        | _ ->
            cacheMisses <- cacheMisses + 1
            //printfn "Miss for %A" x
            let res = f memoizeAux x
            cache.[x] <- res
            res
    memoizeAux

let combinations_plain2 =
    let combinations_plain2Aux combinations_plain2Aux key =
                match key with
                | (i, k) when i = 0 || k = 0        -> 0
                | (i, k) when items.[i-1].Size > k  -> combinations_plain2Aux (i-1, k) 
                | (i, k)                            -> let item = items.[i-1]
                                                       let v1 = combinations_plain2Aux (i-1, k)
                                                       let beforeItem = combinations_plain2Aux (i-1, k-item.Size)
                                                       let v2 = beforeItem + item.Value
                                                       max v1 v2
    let memoized = recursivelyMemoize combinations_plain2Aux
    fun x -> memoized x

cacheHits <- 0
cacheMisses <- 0

printfn "combinations_plain2"
let res3 = time (fun () -> combinations_plain2 (N, K))
printfn "Answer: %d" res3
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""

let recursivelyMemoizeCont f =
    let cache = Dictionary HashIdentity.Structural
    let rec memoizeAux x k =
        match cache.TryGetValue(x) with
        | (true, v) -> 
            cacheHits <- cacheHits + 1
            //printfn "Hit for %A - Result is %A" x v
            k v
        | _ ->
            cacheMisses <- cacheMisses + 1
            //printfn "Miss for %A" x
            f memoizeAux x (fun y ->
                cache.[x] <- y
                k y)
    memoizeAux

let combinations_cont2 =
    let combinations_cont2Aux combinations_cont2Aux key =
        cont {
                match key with
                | (0, _) | (_, 0)                   -> return 0
                | (i, k) when items.[i-1].Size > k  -> return! combinations_cont2Aux (i - 1, k) 
                | (i, k)                            -> let item = items.[i-1]
                                                       let! v1 = combinations_cont2Aux (i-1, k)
                                                       let! beforeItem = combinations_cont2Aux (i-1, k - item.Size)
                                                       let v2 = beforeItem + item.Value
                                                       return max v1 v2
        }
    let memoized = recursivelyMemoizeCont combinations_cont2Aux
    fun x -> memoized x id

cacheHits <- 0
cacheMisses <- 0

printfn "combinations_cont2"
let res4 = time (fun () -> combinations_cont2 (N, K))
printfn "Answer: %d" res4
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""

I get these results:

N = 80, K = 400
combinations_plain
Took: 7.191000
Answer: 6480
Memo hits: 6231
Memo misses: 6552

combinations_plain2
Took: 6.310800
Answer: 6480
Memo hits: 6231
Memo misses: 6552

combinations_cont2
Took: 17.021200
Answer: 6480
Memo hits: 6231
Memo misses: 6552
  • combinations_plain is from latkin's answer.
  • combinations_plain2 exposes the recursive memoization step explicitly.
  • combinations_cont2 adapts the recursive memoization function into one that memoizes the continuation results.
  • combinations_cont2 works by intercepting the result in the continuation before passing it on to the actual continuation. Subsequent calls on the same key provide a continuation and this continuation is fed the answer we intercepted originally.

This demonstrates that we are able to:

  1. Memoize using continuation passing style.
  2. Achieve similar (ish) performance characteristics to the vanilla memoized version.

I hope this clears things up a little. Sorry, my blog code snippet was incomplete (I think I might have lost it when reformatting recently).

like image 43
ZachBray Avatar answered Sep 19 '22 08:09

ZachBray