Will any functional language compiler/runtime reduce all chained iterations into one when applyable? From the programmer perspective we could optimize the functional code with such constructs as lazyness and streams but I am interested to know the other side of the story. My functional example is written in Scala but please don't limit your answers to that language.
Functional way:
// I assume the following line of code will go
// through the collection 3 times, one for creating it
// one for filtering it and one for summing it
val sum = (1L to 1000000L).filter(_ % 2 == 0).sum // => 250000500000
I would like the compiler to optimize to the imperative equivalent of:
/* One iteration only */
long sum, i;
for (i = 1L, sum = 0L; i <= 1000000L; i++) {
if (i % 2 == 0)
sum += i;
}
Haskell is a non-strict language by definition, all implementations I'm aware of use lazy evaluation to provide the non-strict semantics.
The analogous code (with arguments for the start and end, so a compile-time evaluation isn't possible)
val :: Int -> Int -> Int
val low high = sum $ filter even [low .. high]
computes the sum with only one traversal, and in constant small memory. [low .. high]
is syntactic sugar for enumFromTo low high
, and the definition of enumFromTo
for Int
is basically
enumFromTo x y
| y < x = []
| otherwise = go x
where
go k = k : if k == y then [] else go (k+1)
(actually, GHC's implementation uses unboxed Int#
s for reasons of efficiency in the worker go
, but that has no influence on the semantics; for other Integral
types, the definition is analogous).
The definition of filter
is
filter :: (a -> Bool) -> [a] -> [a]
filter _pred [] = []
filter pred (x:xs)
| pred x = x : filter pred xs
| otherwise = filter pred xs
and sum
:
sum l = sum' l 0
where
sum' [] a = a
sum' (x:xs) a = sum' xs (a+x)
Assembling that, even without any optimisations, the evaluation would proceed
sum' (filter even (enumFromTo 1 6)) 0
-- Now it must be determined whether the first argument of sum' is [] or not
-- For that, the application of filter must be evaluated
-- For that, enumFromTo must be evaluated
~> sum' (filter even (1 : go 2)) 0
-- Now filter knows which equation to use, unfortunately, `even 1` is False
~> sum' (filter even (go 2)) 0
~> sum' (filter even (2 : go 3)) 0
-- 2 is even, so
~> sum' (2 : filter even (go 3)) 0
~> sum' (filter even (go 3)) (0+2)
-- Once again, sum asks whether filter is done or not, so filter demands another value or []
-- from go
~> sum' (filter even (3 : go 4)) 2
~> sum' (filter even (go 4)) 2
~> sum' (filter even (4 : go 5)) 2
~> sum' (4 : filter even (go 5)) 2
~> sum' (filter even (go 5)) (2+4)
~> sum' (filter even (5 : go 6)) 6
~> sum' (filter even (go 6)) 6
~> sum' (filter even (6 : [])) 6
~> sum' (6 : filter even []) 6
~> sum' (filter even []) (6+6)
~> sum' [] 12
~> 12
That would of course be less efficient than the loop, since for each element of the enumeration, a list cell has to be produced, then for each element passing the filter a list cell has to be produced, only to be immediately consumed by the sum.
Let's check that the memory usage is indeed small:
module Main (main) where
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
let (low, high) = case args of
(a:b:_) -> (read a, read b)
_ -> error "Want two args"
print $ sum $ filter even [low :: Int .. high]
and run it,
$ ./sumEvens +RTS -s -RTS 1 1000000
250000500000
40,071,856 bytes allocated in the heap
12,504 bytes copied during GC
44,416 bytes maximum residency (2 sample(s))
21,120 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 75 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0002s 0.0003s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.01s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.01s ( 0.01s elapsed)
%GC time 6.1% (7.6% elapsed)
Alloc rate 4,367,976,530 bytes per MUT second
Productivity 91.8% of total user, 115.8% of total elapsed
It allocated about 40MB for 0.5 million list cells(1) and a bit of change, but the maximum residency was about 44KB. Running it with an upper limit of 10 million, the overall allocation (and running time) grows by a factor of 10 (minus constant stuff), but the maximum residency remains the same.
(1) GHC fuses the enumeration and the filter, and produces only the even numbers in the range at type Int
. Unfortunately, it cannot fuse away sum
, since that is a left fold, and GHC's fusion framework only fuses right folds.
Now, to fuse also the sum
, one must do a lot of work teaching GHC to do that with rewrite rules. Fortunately, that has been done for many algorithms in the vector
package, and if we use that,
module Main where
import qualified Data.Vector.Unboxed as U
import System.Environment (getArgs)
val :: Int -> Int -> Int
val low high = U.sum . U.filter even $ U.enumFromN low (high - low + 1)
main :: IO ()
main = do
args <- getArgs
let (low, high) = case args of
(a:b:_) -> (read a, read b)
_ -> error "Want two args"
print $ val low high
we get a faster programme that doesn't even allocate any list cells anymore, the pipeline is really rewritten to the loop:
$ ./sumFilter +RTS -s -RTS 1 10000000
25000005000000
72,640 bytes allocated in the heap
3,512 bytes copied during GC
44,416 bytes maximum residency (1 sample(s))
17,024 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.01s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.01s ( 0.01s elapsed)
%GC time 1.0% (1.2% elapsed)
Alloc rate 7,361,805 bytes per MUT second
Productivity 97.7% of total user, 111.5% of total elapsed
Here's the core that GHC produces for (the worker of) val
, if somebody is interested:
Rec {
Main.main_$s$wfoldlM'_loop [Occ=LoopBreaker]
:: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=3, Caf=NoCafRefs, Str=DmdType LLL]
Main.main_$s$wfoldlM'_loop =
\ (sc_s303 :: GHC.Prim.Int#)
(sc1_s304 :: GHC.Prim.Int#)
(sc2_s305 :: GHC.Prim.Int#) ->
case GHC.Prim.># sc1_s304 0 of _ {
GHC.Types.False -> sc_s303;
GHC.Types.True ->
case GHC.Prim.remInt# sc2_s305 2 of _ {
__DEFAULT ->
Main.main_$s$wfoldlM'_loop
sc_s303 (GHC.Prim.-# sc1_s304 1) (GHC.Prim.+# sc2_s305 1);
0 ->
Main.main_$s$wfoldlM'_loop
(GHC.Prim.+# sc_s303 sc2_s305)
(GHC.Prim.-# sc1_s304 1)
(GHC.Prim.+# sc2_s305 1)
}
}
end Rec }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With