Problem Hello, I'm using accelerate library to create an application allowing the user to interactively call functions that process images, that's why I'm basing on and extending ghci using ghc api. The problem is that when running the compiled executable from the shell the computations are done under 100ms (slightly less than 80), while running the same compiled code within ghci it takes over 100ms (on average a bit more than 140) to finish. Resources sample code + execution logs: https://gist.github.com/zgredzik/15a437c87d3d8d03b8fc Description First of all: the tests were ran after the CUDA kernel was compiled (the compilation itself added additional 2 seconds but that's not the case). When running the compiled executable from the shell the computations are done in under 10ms. (<code>shell first run</code> and <code>second shell run</code> have different arguments passed to make sure the data wasn't cached anywhere). When trying to run the same code from ghci and fiddling with the input data, the computations take over 100ms. I understand that interpreted code is slower than compiled one, but I'm loading the same compiled code within the ghci session and calling the same top level binding (<code>packedFunction</code>). I've explicitly typed it to make sure it is specialized (same results as using the SPECIALIZED pragma). However the computations do take less than 10ms if I run the <code>main</code> function in ghci (even when changing the input data with <code>:set args</code> between consecutive calls). Compiled the <code>Main.hs</code> with <code>ghc -o main Main.hs -O2 -dynamic -threaded</code> I'm wondering where the overhead comes from. Does anyone have any suggestions as to why this is happening? <hr> A simplified version of the example posted by remdezx : <pre class="prettyprint"><code>{-# LANGUAGE OverloadedStrings #-} module Main where import Data.Array.Accelerate as A import Data.Array.Accelerate.CUDA as C import Data.Time.Clock (diffUTCTime, getCurrentTime) main :: IO () main = do start <- getCurrentTime print $ C.run $ A.maximum $ A.map (+1) $ A.use (fromList (Z:.1000000) [1..1000000] :: Vector Double) end <- getCurrentTime print $ diffUTCTime end start </code></pre> When I compile it and execute it takes 0,09s to finish. <pre class="prettyprint"><code>$ ghc -O2 Main.hs -o main -threaded [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking main ... $ ./main Array (Z) [1000001.0] 0.092906s </code></pre> But when I precompile it and run in interpreter it takes 0,25s <pre class="prettyprint"><code>$ ghc -O2 Main.hs -c -dynamic $ ghci Main ghci> main Array (Z) [1000001.0] 0.258224s </code></pre>

I investigated <code>accelerate</code> and <code>accelerate-cuda</code> and put some debug code to measure a time both under ghci and in a compiled, optimised version. Results are below, you can see stack trace and execution times. ghci run <pre class="prettyprint"><code>$ ghc -O2 -dynamic -c -threaded Main.hs && ghci GHCi, version 7.8.3: http://www.haskell.org/ghc/ :? for help … Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Ok, modules loaded: Main. Prelude Main> Loading package transformers-0.3.0.0 ... linking ... done. … Loading package array-0.5.0.0 ... linking ... done. (...) Loading package accelerate-cuda-0.15.0.0 ... linking ... done. >>>>> run >>>>> runAsyncIn.execute >>>>> runAsyncIn.seq ctx <<<<< runAsyncIn.seq ctx: 4.1609e-2 CPU 0.041493s TOTAL >>>>> runAsyncIn.seq a <<<<< runAsyncIn.seq a: 1.0e-6 CPU 0.000001s TOTAL >>>>> runAsyncIn.seq acc >>>>> convertAccWith True <<<<< convertAccWith: 0.0 CPU 0.000017s TOTAL <<<<< runAsyncIn.seq acc: 2.68e-4 CPU 0.000219s TOTAL >>>>> evalCUDA >>>>> push <<<<< push: 0.0 CPU 0.000002s TOTAL >>>>> evalStateT >>>>> runAsyncIn.compileAcc >>>>> compileOpenAcc >>>>> compileOpenAcc.traveuseAcc.Alet >>>>> compileOpenAcc.traveuseAcc.Use >>>>> compileOpenAcc.traveuseAcc.use3 >>>>> compileOpenAcc.traveuseAcc.use1 <<<<< compileOpenAcc.traveuseAcc.use1: 0.0 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.use2 >>>>> compileOpenAcc.traveuseAcc.seq arr <<<<< compileOpenAcc.traveuseAcc.seq arr: 0.105716 CPU 0.105501s TOTAL >>>>> useArrayAsync <<<<< useArrayAsync: 1.234e-3 CPU 0.001505s TOTAL <<<<< compileOpenAcc.traveuseAcc.use2: 0.108012 CPU 0.108015s TOTAL <<<<< compileOpenAcc.traveuseAcc.use3: 0.108539 CPU 0.108663s TOTAL <<<<< compileOpenAcc.traveuseAcc.Use: 0.109375 CPU 0.109005s TOTAL >>>>> compileOpenAcc.traveuseAcc.Fold1 >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0s TOTAL <<<<< compileOpenAcc.traveuseAcc.Fold1: 2.059e-3 CPU 0.002384s TOTAL <<<<< compileOpenAcc.traveuseAcc.Alet: 0.111434 CPU 0.112034s TOTAL <<<<< compileOpenAcc: 0.11197 CPU 0.112615s TOTAL <<<<< runAsyncIn.compileAcc: 0.11197 CPU 0.112833s TOTAL >>>>> runAsyncIn.dumpStats <<<<< runAsyncIn.dumpStats: 2.0e-6 CPU 0.000001s TOTAL >>>>> runAsyncIn.executeAcc >>>>> executeAcc <<<<< executeAcc: 8.96e-4 CPU 0.00049s TOTAL <<<<< runAsyncIn.executeAcc: 9.36e-4 CPU 0.0007s TOTAL >>>>> runAsyncIn.collect <<<<< runAsyncIn.collect: 0.0 CPU 0.000027s TOTAL <<<<< evalStateT: 0.114156 CPU 0.115327s TOTAL >>>>> pop <<<<< pop: 0.0 CPU 0.000002s TOTAL >>>>> performGC <<<<< performGC: 5.7246e-2 CPU 0.057814s TOTAL <<<<< evalCUDA: 0.17295 CPU 0.173943s TOTAL <<<<< runAsyncIn.execute: 0.215475 CPU 0.216563s TOTAL <<<<< run: 0.215523 CPU 0.216771s TOTAL Array (Z) [1000001.0] 0.217148s Prelude Main> Leaving GHCi. </code></pre> compiled code run <pre class="prettyprint"><code>$ ghc -O2 -threaded Main.hs && ./Main [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... >>>>> run >>>>> runAsyncIn.execute >>>>> runAsyncIn.seq ctx <<<<< runAsyncIn.seq ctx: 4.0639e-2 CPU 0.041498s TOTAL >>>>> runAsyncIn.seq a <<<<< runAsyncIn.seq a: 1.0e-6 CPU 0.000001s TOTAL >>>>> runAsyncIn.seq acc >>>>> convertAccWith True <<<<< convertAccWith: 1.2e-5 CPU 0.000005s TOTAL <<<<< runAsyncIn.seq acc: 1.15e-4 CPU 0.000061s TOTAL >>>>> evalCUDA >>>>> push <<<<< push: 2.0e-6 CPU 0.000002s TOTAL >>>>> evalStateT >>>>> runAsyncIn.compileAcc >>>>> compileOpenAcc >>>>> compileOpenAcc.traveuseAcc.Alet >>>>> compileOpenAcc.traveuseAcc.Use >>>>> compileOpenAcc.traveuseAcc.use3 >>>>> compileOpenAcc.traveuseAcc.use1 <<<<< compileOpenAcc.traveuseAcc.use1: 0.0 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.use2 >>>>> compileOpenAcc.traveuseAcc.seq arr <<<<< compileOpenAcc.traveuseAcc.seq arr: 3.6651e-2 CPU 0.03712s TOTAL >>>>> useArrayAsync <<<<< useArrayAsync: 1.427e-3 CPU 0.001427s TOTAL <<<<< compileOpenAcc.traveuseAcc.use2: 3.8776e-2 CPU 0.039152s TOTAL <<<<< compileOpenAcc.traveuseAcc.use3: 3.8794e-2 CPU 0.039207s TOTAL <<<<< compileOpenAcc.traveuseAcc.Use: 3.8808e-2 CPU 0.03923s TOTAL >>>>> compileOpenAcc.traveuseAcc.Fold1 >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 2.0e-6 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 2.0e-6 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0.000001s TOTAL >>>>> compileOpenAcc.traveuseAcc.Avar <<<<< compileOpenAcc.traveuseAcc.Avar: 0.0 CPU 0.000001s TOTAL <<<<< compileOpenAcc.traveuseAcc.Fold1: 1.342e-3 CPU 0.001284s TOTAL <<<<< compileOpenAcc.traveuseAcc.Alet: 4.0197e-2 CPU 0.040578s TOTAL <<<<< compileOpenAcc: 4.0248e-2 CPU 0.040895s TOTAL <<<<< runAsyncIn.compileAcc: 4.0834e-2 CPU 0.04103s TOTAL >>>>> runAsyncIn.dumpStats <<<<< runAsyncIn.dumpStats: 0.0 CPU 0s TOTAL >>>>> runAsyncIn.executeAcc >>>>> executeAcc <<<<< executeAcc: 2.87e-4 CPU 0.000403s TOTAL <<<<< runAsyncIn.executeAcc: 2.87e-4 CPU 0.000488s TOTAL >>>>> runAsyncIn.collect <<<<< runAsyncIn.collect: 9.2e-5 CPU 0.000049s TOTAL <<<<< evalStateT: 4.1213e-2 CPU 0.041739s TOTAL >>>>> pop <<<<< pop: 0.0 CPU 0.000002s TOTAL >>>>> performGC <<<<< performGC: 9.41e-4 CPU 0.000861s TOTAL <<<<< evalCUDA: 4.3308e-2 CPU 0.042893s TOTAL <<<<< runAsyncIn.execute: 8.5154e-2 CPU 0.084815s TOTAL <<<<< run: 8.5372e-2 CPU 0.085035s TOTAL Array (Z) [1000001.0] 0.085169s </code></pre> As we can see there are two major problems: evaluation of <code>fromList (Z:.1000000) [1..1000000] :: Vector Double</code> which takes 69 ms extra under ghci (106ms - 37ms), and <code>performGC</code> call which takes 57 ms extra (58 ms - 1 ms). These two sum up to the difference between execution under ghci and in a compiled version. I suppose, that in compiled program, RTS manage memory in a different way than in ghci, so allocation and gc can be faster. We can also test only this part evaluating below code (it does not require CUDA at all): <pre class="prettyprint"><code>import Data.Array.Accelerate.Array.Sugar import Data.Time.Clock (diffUTCTime, getCurrentTime) import System.Mem (performGC) main :: IO () main = do measure $ seq (fromList (Z:.1000000) [1..1000000] :: Vector Double) $ return () measure $ performGC measure action = do start <- getCurrentTime action end <- getCurrentTime print $ diffUTCTime end start </code></pre> Results: <ul> <li>evaluating vector takes 0.121653s under ghci and 0.035162s in a compiled version </li> <li>performGC takes 0.044876s under ghci and 0.00031s in a compiled version.</li> </ul> This could be another question, but maybe someone know: Can we tune somehow garbage collector to work faster under ghci?

Difference in performance of compiled accelerate code ran from ghci and shell

Tags:

performance

profiling

haskell

ghci

accelerate-haskell

Problem

Hello, I'm using accelerate library to create an application allowing the user to interactively call functions that process images, that's why I'm basing on and extending ghci using ghc api.

The problem is that when running the compiled executable from the shell the computations are done under 100ms (slightly less than 80), while running the same compiled code within ghci it takes over 100ms (on average a bit more than 140) to finish.

Resources

sample code + execution logs: https://gist.github.com/zgredzik/15a437c87d3d8d03b8fc

Description

First of all: the tests were ran after the CUDA kernel was compiled (the compilation itself added additional 2 seconds but that's not the case).

When running the compiled executable from the shell the computations are done in under 10ms. (shell first run and second shell run have different arguments passed to make sure the data wasn't cached anywhere).

When trying to run the same code from ghci and fiddling with the input data, the computations take over 100ms. I understand that interpreted code is slower than compiled one, but I'm loading the same compiled code within the ghci session and calling the same top level binding (packedFunction). I've explicitly typed it to make sure it is specialized (same results as using the SPECIALIZED pragma).

However the computations do take less than 10ms if I run the main function in ghci (even when changing the input data with :set args between consecutive calls).

Compiled the Main.hs with ghc -o main Main.hs -O2 -dynamic -threaded

I'm wondering where the overhead comes from. Does anyone have any suggestions as to why this is happening?

A simplified version of the example posted by remdezx :

{-# LANGUAGE OverloadedStrings #-}  module Main where  import Data.Array.Accelerate as A import Data.Array.Accelerate.CUDA as C import Data.Time.Clock       (diffUTCTime, getCurrentTime)  main :: IO () main = do     start <- getCurrentTime     print $ C.run $ A.maximum $ A.map (+1) $ A.use (fromList (Z:.1000000) [1..1000000] :: Vector Double)     end   <- getCurrentTime     print $ diffUTCTime end start

When I compile it and execute it takes 0,09s to finish.

$ ghc -O2 Main.hs -o main -threaded [1 of 1] Compiling Main             ( Main.hs, Main.o ) Linking main ... $ ./main Array (Z) [1000001.0] 0.092906s

But when I precompile it and run in interpreter it takes 0,25s

$ ghc -O2 Main.hs -c -dynamic $ ghci Main ghci> main Array (Z) [1000001.0] 0.258224s

507

asked Dec 18 '14 07:12

Konrad Madej

1 Answers

I investigated accelerate and accelerate-cuda and put some debug code to measure a time both under ghci and in a compiled, optimised version.

Results are below, you can see stack trace and execution times.

ghci run

$ ghc -O2 -dynamic -c -threaded Main.hs && ghci  GHCi, version 7.8.3: http://www.haskell.org/ghc/  :? for help … Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Ok, modules loaded: Main. Prelude Main> Loading package transformers-0.3.0.0 ... linking ... done. … Loading package array-0.5.0.0 ... linking ... done. (...) Loading package accelerate-cuda-0.15.0.0 ... linking ... done. >>>>> run >>>>> runAsyncIn.execute >>>>>  runAsyncIn.seq ctx <<<<<  runAsyncIn.seq ctx: 4.1609e-2 CPU  0.041493s TOTAL >>>>>  runAsyncIn.seq a <<<<<  runAsyncIn.seq a: 1.0e-6 CPU  0.000001s TOTAL >>>>>  runAsyncIn.seq acc >>>>>   convertAccWith True <<<<<   convertAccWith: 0.0 CPU  0.000017s TOTAL <<<<<  runAsyncIn.seq acc: 2.68e-4 CPU  0.000219s TOTAL >>>>>  evalCUDA >>>>>   push <<<<<   push: 0.0 CPU  0.000002s TOTAL >>>>>   evalStateT >>>>>    runAsyncIn.compileAcc >>>>>     compileOpenAcc >>>>>      compileOpenAcc.traveuseAcc.Alet >>>>>      compileOpenAcc.traveuseAcc.Use >>>>>       compileOpenAcc.traveuseAcc.use3 >>>>>       compileOpenAcc.traveuseAcc.use1 <<<<<       compileOpenAcc.traveuseAcc.use1: 0.0 CPU  0.000001s TOTAL >>>>>       compileOpenAcc.traveuseAcc.use2 >>>>>        compileOpenAcc.traveuseAcc.seq arr <<<<<        compileOpenAcc.traveuseAcc.seq arr: 0.105716 CPU  0.105501s TOTAL >>>>>        useArrayAsync <<<<<        useArrayAsync: 1.234e-3 CPU  0.001505s TOTAL <<<<<       compileOpenAcc.traveuseAcc.use2: 0.108012 CPU  0.108015s TOTAL <<<<<       compileOpenAcc.traveuseAcc.use3: 0.108539 CPU  0.108663s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Use: 0.109375 CPU  0.109005s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Fold1 >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0.000001s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0.000001s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Fold1: 2.059e-3 CPU  0.002384s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Alet: 0.111434 CPU  0.112034s TOTAL <<<<<     compileOpenAcc: 0.11197 CPU  0.112615s TOTAL <<<<<    runAsyncIn.compileAcc: 0.11197 CPU  0.112833s TOTAL >>>>>    runAsyncIn.dumpStats <<<<<    runAsyncIn.dumpStats: 2.0e-6 CPU  0.000001s TOTAL >>>>>    runAsyncIn.executeAcc >>>>>     executeAcc <<<<<     executeAcc: 8.96e-4 CPU  0.00049s TOTAL <<<<<    runAsyncIn.executeAcc: 9.36e-4 CPU  0.0007s TOTAL >>>>>    runAsyncIn.collect <<<<<    runAsyncIn.collect: 0.0 CPU  0.000027s TOTAL <<<<<   evalStateT: 0.114156 CPU  0.115327s TOTAL >>>>>   pop <<<<<   pop: 0.0 CPU  0.000002s TOTAL >>>>>   performGC <<<<<   performGC: 5.7246e-2 CPU  0.057814s TOTAL <<<<<  evalCUDA: 0.17295 CPU  0.173943s TOTAL <<<<< runAsyncIn.execute: 0.215475 CPU  0.216563s TOTAL <<<<< run: 0.215523 CPU  0.216771s TOTAL Array (Z) [1000001.0] 0.217148s Prelude Main> Leaving GHCi.

compiled code run

$ ghc -O2 -threaded Main.hs && ./Main [1 of 1] Compiling Main             ( Main.hs, Main.o ) Linking Main ... >>>>> run >>>>> runAsyncIn.execute >>>>>  runAsyncIn.seq ctx <<<<<  runAsyncIn.seq ctx: 4.0639e-2 CPU  0.041498s TOTAL >>>>>  runAsyncIn.seq a <<<<<  runAsyncIn.seq a: 1.0e-6 CPU  0.000001s TOTAL >>>>>  runAsyncIn.seq acc >>>>>   convertAccWith True <<<<<   convertAccWith: 1.2e-5 CPU  0.000005s TOTAL <<<<<  runAsyncIn.seq acc: 1.15e-4 CPU  0.000061s TOTAL >>>>>  evalCUDA >>>>>   push <<<<<   push: 2.0e-6 CPU  0.000002s TOTAL >>>>>   evalStateT >>>>>    runAsyncIn.compileAcc >>>>>     compileOpenAcc >>>>>      compileOpenAcc.traveuseAcc.Alet >>>>>      compileOpenAcc.traveuseAcc.Use >>>>>       compileOpenAcc.traveuseAcc.use3 >>>>>       compileOpenAcc.traveuseAcc.use1 <<<<<       compileOpenAcc.traveuseAcc.use1: 0.0 CPU  0.000001s TOTAL >>>>>       compileOpenAcc.traveuseAcc.use2 >>>>>        compileOpenAcc.traveuseAcc.seq arr <<<<<        compileOpenAcc.traveuseAcc.seq arr: 3.6651e-2 CPU  0.03712s TOTAL >>>>>        useArrayAsync <<<<<        useArrayAsync: 1.427e-3 CPU  0.001427s TOTAL <<<<<       compileOpenAcc.traveuseAcc.use2: 3.8776e-2 CPU  0.039152s TOTAL <<<<<       compileOpenAcc.traveuseAcc.use3: 3.8794e-2 CPU  0.039207s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Use: 3.8808e-2 CPU  0.03923s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Fold1 >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 2.0e-6 CPU  0.000001s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 2.0e-6 CPU  0.000001s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0.000001s TOTAL >>>>>      compileOpenAcc.traveuseAcc.Avar <<<<<      compileOpenAcc.traveuseAcc.Avar: 0.0 CPU  0.000001s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Fold1: 1.342e-3 CPU  0.001284s TOTAL <<<<<      compileOpenAcc.traveuseAcc.Alet: 4.0197e-2 CPU  0.040578s TOTAL <<<<<     compileOpenAcc: 4.0248e-2 CPU  0.040895s TOTAL <<<<<    runAsyncIn.compileAcc: 4.0834e-2 CPU  0.04103s TOTAL >>>>>    runAsyncIn.dumpStats <<<<<    runAsyncIn.dumpStats: 0.0 CPU  0s TOTAL >>>>>    runAsyncIn.executeAcc >>>>>     executeAcc <<<<<     executeAcc: 2.87e-4 CPU  0.000403s TOTAL <<<<<    runAsyncIn.executeAcc: 2.87e-4 CPU  0.000488s TOTAL >>>>>    runAsyncIn.collect <<<<<    runAsyncIn.collect: 9.2e-5 CPU  0.000049s TOTAL <<<<<   evalStateT: 4.1213e-2 CPU  0.041739s TOTAL >>>>>   pop <<<<<   pop: 0.0 CPU  0.000002s TOTAL >>>>>   performGC <<<<<   performGC: 9.41e-4 CPU  0.000861s TOTAL <<<<<  evalCUDA: 4.3308e-2 CPU  0.042893s TOTAL <<<<< runAsyncIn.execute: 8.5154e-2 CPU  0.084815s TOTAL <<<<< run: 8.5372e-2 CPU  0.085035s TOTAL Array (Z) [1000001.0] 0.085169s

As we can see there are two major problems: evaluation of fromList (Z:.1000000) [1..1000000] :: Vector Double which takes 69 ms extra under ghci (106ms - 37ms), and performGC call which takes 57 ms extra (58 ms - 1 ms). These two sum up to the difference between execution under ghci and in a compiled version.

I suppose, that in compiled program, RTS manage memory in a different way than in ghci, so allocation and gc can be faster. We can also test only this part evaluating below code (it does not require CUDA at all):

import Data.Array.Accelerate.Array.Sugar import Data.Time.Clock                   (diffUTCTime, getCurrentTime) import System.Mem                        (performGC)   main :: IO () main = do     measure $ seq (fromList (Z:.1000000) [1..1000000] :: Vector Double) $ return ()     measure $ performGC  measure action = do     start <- getCurrentTime     action     end   <- getCurrentTime     print $ diffUTCTime end start

Results:

evaluating vector takes 0.121653s under ghci and 0.035162s in a compiled version
performGC takes 0.044876s under ghci and 0.00031s in a compiled version.

This could be another question, but maybe someone know: Can we tune somehow garbage collector to work faster under ghci?

179

answered Sep 24 '22 08:09

remdezx

Related questions
                            
                                How to do very fast inserts to SQL Server 2008
                            
                                ORM performance: is greenDAO faster than ORMLite?
                            
                                C#: What is the fastest way to generate a unique filename?
                            
                                std::pair<int, int> vs struct with two int's
                            
                                What is my bottleneck when cloning a git repository from a virtual machine with a fast network connection?
                            
                                Why is PHP apt for high-traffic websites?
                            
                                Which method of checking to see if a NSDictionary contains a particular key is faster?
                            
                                java: how many times is the collection expression evaluated in a "foreach"
                            
                                Overhead of pthread mutexes?
                            
                                How do I use python for web development without relying on a framework?
                            
                                Why ITelephony.aidl works?
                            
                                Deleting objects from an ArrayList in Java
                            
                                Fast way to replace elements in array - C
                            
                                Fastest way to iterate an Array in Java: loop variable vs enhanced for statement [duplicate]
                            
                                How to pass values across the pages in ASP.net without using Session
                            
                                What is Azul "Zing"? [closed]
                            
                                C++ 11 auto compile time or runtime?
                            
                                Performance of C++ vs Virtual Machine languages in high frequency finance
                            
                                Why does n++ execute faster than n=n+1?
                            
                                x=x+1 vs. x +=1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With