Here's haskell code
import GHC.Int
triples = [(x, y, z) | z <- [(1::Int32)..],
x <- [(1::Int32) .. z + 1],
y <- [x.. z + 1],
x * x + y * y == z * z]
main = mapM_ print (Prelude.take 1000 triples)
Which has following profile
triples +RTS -p -RTS
total time = 47.10 secs (47103 ticks @ 1000 us, 1 processor)
total alloc = 62,117,115,176 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
triples Main triples.hs:(5,1)-(8,46) 100.0 100.0
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
MAIN MAIN <built-in> 118 0 0.0 0.0 100.0 100.0
CAF Main <entire-module> 235 0 0.0 0.0 100.0 100.0
main Main triples.hs:10:1-46 236 1 0.0 0.0 0.0 0.0
triples Main triples.hs:(5,1)-(8,46) 237 1 100.0 100.0 100.0 100.0
CAF GHC.Conc.Signal <entire-module> 227 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding <entire-module> 216 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv <entire-module> 214 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD <entire-module> 206 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.Text <entire-module> 144 0 0.0 0.0 0.0 0.0
main Main triples.hs:10:1-46 238 0 0.0 0.0 0.0 0.0
While equivalent rust code runs an order of magnitude faster. Which seems very strange to me.
fn triples() -> impl Iterator<Item=(i32, i32, i32)> {
(1..).flat_map(|z| {
(1..z + 1).flat_map(move |x| {
(x..z + 1).filter_map(move |y| {
if x * x + y * y == z * z {
Some((x, y, z))
} else {
None
}
})
})
})
}
fn main() {
for triple in triples().take(1000) {
println!("{:?}", triple);
// unsafe {printf("(%i, %i, %i)\n".as_ptr() as *const i8, x, y, z)};
}
}
Results are
[I] ~/c/pythagoras (master|✚1…) $ time ./range > /dev/null
0.16user 0.00system 0:00.16elapsed 100%CPU (0avgtext+0avgdata 2248maxresident)k
0inputs+0outputs (0major+124minor)pagefaults 0swaps
[I] ~/c/pythagoras (master|✚1…) $ time ./triples > /dev/null
2.39user 0.00system 0:02.39elapsed 99%CPU (0avgtext+0avgdata 4736maxresident)k
0inputs+0outputs (0major+473minor)pagefaults 0swaps
Both results are with -O3 flag.
Is it possible to optimize allocations out while saving idiomatic haskell code? Maybe some fusion library or something can do this?
EDIT1. Okay using Int instead of Int32 or Int64 makes the code faster which is nice. Still with fflvm it's twice slower than rust and judging by profile it's still spends most of the time in allocations. What's preventing haskell from reusing the triple for example and not allocating it only once?
There are two issues with your code:
For performance, you should compile without profiling, and with optimization. Profiling adds significant overhead. On my system ghc -prof results in a runtime of over 40 seconds, similarly to your time. ghc -O2 without -prof yields just 4.2 seconds.
Using Int32 on a 64 bit system. You shouldn't do this, because non-native sized Int operations are compiled to slow out-of-line primops. When I change Int32 to Int, runtime becomes 0.44 seconds. If I additionally use -fllvm for the LLVM code backend, I get 0.2 seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With