I decided to use F# to solve the second problem for the first day of Advent of Code 2018 (performing a cyclic summation and finding the first repeated sum) in an expressive manner, but the performance is lacking and I can't find the cause of the slowdown.
For a given input with ~140,000 summations, this code executes in a few seconds.
data = list(map(int, '''
+1
-1
'''.strip().splitlines()))
from itertools import cycle, accumulate
class superset(set):
def add(self, other):
super().add(other)
return other
def mapwhile(func, pred, iterable):
for i in iterable:
if not pred(i):
yield func(i)
return
yield func(i)
def last(iterable):
return list(iterable)[-1]
s = superset([0])
print(last(mapwhile(s.add, lambda x: x not in s, accumulate(cycle(data)))))
I added a conditional breakpoint on the match expression to time every thousandth i
, it seems this code performs ~100 sums/sec and would not come to a solution even after an hour. A dramatic slowdown by a ridiculous order in magnitude.
let input = @"
+1
-1
"
let cycle xs = seq { while true do yield! xs }
let accumusum xs = Seq.scan(fun acc elem -> acc + elem) 0 xs
let rec findfreqcycle i (s:int Set) (data:int seq) =
let head, tail = Seq.head data, Seq.tail data
match s.Contains(head) with
| false -> findfreqcycle (i+1) (s.Add(head)) (tail)
| true -> head
let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList |> cycle
accumusum data |> findfreqcycle 0 Set.empty
As far as I can tell, the core ideas behind each code sample are more or less the same. The input is eagerly parsed only once, with a generator function/sequence to lazily repeat each number.
The only difference is that the function to actually find the first repeating summation is recursive in the F# example. Memory profiling indicates almost constant memory usage, and tail recursion is active.
What could I be doing wrong, and how can I better profile these recursive and generative functions for performance?
As mentioned in the comments, Seq.tail
is horribly inefficient, especially if you use it in a loop in the way you do. The reason is that it creates a new sequence that iterates over the original sequence and skips the first element (so, after 1000 iterations, you have to go over 1000 sequences, each skipping one element).
The pattern with head and tail works much better if you use a list, because functional lists have been designed for this kind of processing. In your case, you could do something like this (which follows the same pattern as your original function):
let rec findfreqcycle sum (s:int Set) input data =
match data with
| x::xs when s.Contains (sum + x) -> (sum + x)
| x::xs -> findfreqcycle (sum + x) (s.Add (sum + x)) input xs
| [] -> findfreqcycle sum s input input
let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList
findfreqcycle 0 Set.empty data data
I changed it so that it uses pattern matching (on lists). I also changed the code so that it takes a finite list and, when it gets to the end, it starts again. As a consequence, it also sums the numbers on the fly (rather than using Seq.scan
- that would not work here, because I'm not using infinite lists).
On the input from Pastebin, I get the result 448 in about 0.17 seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With