In Haskell, I've found three simple implementations of the Sieve of Eratosthenes on the Rosetta Code page. Now my question is, which one should be used in which situations? Correcting my initial reasoning would be helpful too: I'm assuming the List one is the most idiomatic and easy to read for a Haskeller. Is it correct, though? I'm wondering if it suffers from the same problems as another list-based sieve that I then learned was not actually implementing the algorithm: (edit: shown here is the list-based sieve I know has problems, not the one from RosettaCode, which I pasted at the bottom) <pre class="prettyprint"><code>primes = sieve [2..] where sieve (p:x) = p : sieve [ n | n <- x, n `mod` p > 0 ] </code></pre> In terms of performance, the immutable Array seems to be the winner. With an upper bound <code>m</code> of <code>2000000</code>, the times were about: <ul> <li>1.3s for List</li> <li>0.6s for Array</li> <li>1.8s for Mutable Array</li> </ul> So I'd pick Array for performance. And of course, the Mutable Array one is also easy to reason about since I have a more imperative language background. I'm not sure why I would pick this one if I'm coding in Haskell, though, since it's both slower than the others and non-idiomatic. Code copied here for reference: List: <pre class="prettyprint"><code>primesTo m = 2 : eratos [3,5..m] where eratos (p : xs) | p*p>m = p : xs | True = p : eratos (xs `minus` [p*p, p*p+2*p..]) minus a@(x:xs) b@(y:ys) = case compare x y of LT -> x : minus xs b EQ -> minus xs ys GT -> minus a ys minus a b = a </code></pre> Immutable Array: <pre class="prettyprint"><code>import Data.Array.Unboxed primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool) where sieve p a | p*p > m = 2 : [i | (i,True) <- assocs a] | a!p = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]] | otherwise = sieve (p+2) a </code></pre> Mutable Array: <pre class="prettyprint"><code>import Control.Monad (forM_, when) import Control.Monad.ST import Data.Array.ST import Data.Array.Unboxed primeSieve :: Integer -> UArray Integer Bool primeSieve top = runSTUArray $ do a <- newArray (2,top) True -- :: ST s (STUArray s Integer Bool) let r = ceiling . sqrt $ fromInteger top forM_ [2..r] $ \i -> do ai <- readArray a i when ai $ do forM_ [i*i,i*i+i..top] $ \j -> do writeArray a j False return a -- Return primes from sieve as list: primesTo :: Integer -> [Integer] primesTo top = [p | (p,True) <- assocs $ primeSieve top] </code></pre> EDIT <ul> <li>I showed Turner's Sieve at the top but that's not the list-based example I'm comparing with the other two. I just wanted to know if the list-based example suffers from the same "not the correct Sieve of Eratosthenes" problems as Turner's.</li> <li>It appears the performance comparison is unfair because the Mutable Array example goes through evens as well and uses <code>Integer</code> rather than <code>Int</code>...</li> </ul>

Mutable array will always be the winner in terms of performance (and you really should've copied the version that works on odds only as a minimum; it should be the fastest of the three - also because it uses <code>Int</code> and not <code>Integer</code>). For lists, tree-shaped merging incremental sieve should perform better than the one you show. You can always use it with <code>takeWhile (< limit)</code> if needed. I contend that it conveys the true nature of the sieve most clearly: <pre class="prettyprint"><code>import Data.List (unfoldr) primes :: [Int] primes = 2 : _Y ((3 :) . gaps 5 . _U . map (\p -> [p*p, p*p+2*p..])) _Y g = g (_Y g) -- recursion _U ((x:xs):t) = (x :) . union xs . _U -- ~= nub . sort . concat . unfoldr (\(a:b:c) -> Just (union a b, c)) $ t gaps k s@(x:xs) | k < x = k : gaps (k+2) s -- ~= [k,k+2..]\\s, when | otherwise = gaps (k+2) xs -- k<=x && null(s\\[k,k+2..]) union a@(x:xs) b@(y:ys) = case compare x y of -- ~= nub . sort .: (++) LT -> x : union xs b EQ -> x : union xs ys GT -> y : union a ys </code></pre> <code>_U</code> reimplements <code>Data.List.Ordered.unionAll</code>, and <code>gaps 5</code> is <code>(minus [5,7..])</code>, fused for efficiency, with <code>minus</code> and <code>union</code> from the same package. Of course nothing beats the brevity of <code>Data.List.nubBy (((>1).).gcd) [2..]</code> (but it's very slow). To your 1st new question: not. It does find the multiples by counting up, as any true sieve should (although "minus" on lists is of course under-performant; the above improves on that by re-arranging a linear subtraction chain <code>((((xs-a)-b)-c)- ... )</code> into a subtraction of tree-folded additions, <code>xs-(a+((b+c)+...))</code>).

What's the ideal implementation for the Sieve of Eratosthenes between Lists, Arrays, and Mutable Arrays?

Tags:

performance

haskell

primes

sieve-of-eratosthenes

In Haskell, I've found three simple implementations of the Sieve of Eratosthenes on the Rosetta Code page.

Now my question is, which one should be used in which situations?

Correcting my initial reasoning would be helpful too:

I'm assuming the List one is the most idiomatic and easy to read for a Haskeller. Is it correct, though? I'm wondering if it suffers from the same problems as another list-based sieve that I then learned was not actually implementing the algorithm:
(edit: shown here is the list-based sieve I know has problems, not the one from RosettaCode, which I pasted at the bottom)

primes = sieve [2..] where
         sieve (p:x) = p : sieve [ n | n <- x, n `mod` p > 0 ]

In terms of performance, the immutable Array seems to be the winner. With an upper bound m of 2000000, the times were about:

1.3s for List
0.6s for Array
1.8s for Mutable Array

So I'd pick Array for performance.

And of course, the Mutable Array one is also easy to reason about since I have a more imperative language background. I'm not sure why I would pick this one if I'm coding in Haskell, though, since it's both slower than the others and non-idiomatic.

Code copied here for reference:

List:

primesTo m = 2 : eratos [3,5..m] where
eratos (p : xs) | p*p>m = p : xs
                | True  = p : eratos (xs `minus` [p*p, p*p+2*p..])

minus a@(x:xs) b@(y:ys) = case compare x y of
     LT -> x : minus  xs b
     EQ ->     minus  xs ys
     GT ->     minus  a  ys
minus a        b        = a

Immutable Array:

import Data.Array.Unboxed

primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool)
  where
    sieve p a 
      | p*p > m   = 2 : [i | (i,True) <- assocs a]
      | a!p       = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
      | otherwise = sieve (p+2) a

Mutable Array:

import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

primeSieve :: Integer -> UArray Integer Bool
primeSieve top = runSTUArray $ do
    a <- newArray (2,top) True           -- :: ST s (STUArray s Integer Bool)
    let r = ceiling . sqrt $ fromInteger top
    forM_ [2..r] $ \i -> do
      ai <- readArray a i
      when ai $ do
        forM_ [i*i,i*i+i..top] $ \j -> do
          writeArray a j False
    return a

-- Return primes from sieve as list:  
primesTo :: Integer -> [Integer]
primesTo top = [p | (p,True) <- assocs $ primeSieve top]

EDIT

I showed Turner's Sieve at the top but that's not the list-based example I'm comparing with the other two. I just wanted to know if the list-based example suffers from the same "not the correct Sieve of Eratosthenes" problems as Turner's.
It appears the performance comparison is unfair because the Mutable Array example goes through evens as well and uses Integer rather than Int...

739

asked Feb 22 '13 12:02

Hudon

2 Answers

This

primes = sieve [2..] where
         sieve (p:x) = p : sieve [ n | n <- x, n `mod` p > 0 ]

is not a sieve. It's very inefficient trial division. Don't use that!

I'm curious about how you got your times, there is no way that the Turner "sieve" could produce the primes not exceeding 2,000,000 in mere seconds. Letting it find the primes to 200,000 took

MUT     time    6.38s  (  6.39s elapsed)
GC      time    9.19s  (  9.20s elapsed)
EXIT    time    0.00s  (  0.00s elapsed)
Total   time   15.57s  ( 15.59s elapsed)

on my box (64-bit Linux, ghc-7.6.1, compiled with -O2). The complexity of that algorithm is O(N² / log² N), almost quadratic. Letting it proceed to 2,000,000 would take about twenty minutes.

Your times for the array versions are suspicious too, though in the other direction. Did you measure interpreted code?

Sieving to 2,000,000, compiled with optimisations, the mutable array code took 0.35 seconds to run, and the immutable array code 0.12 seconds.

Now, that still has the mutable array about three times slower than the immutable array.

But, it's an unfair comparison. For the immutable array, you used Ints, and for the mutable array Integers. Changing the mutable array code to use Ints - as it should, since under the hood, arrays are Int-indexed, so using Integer is an unnecessary performance sacrifice that buys nothing - made the mutable array code run in 0.15 seconds. Close to the mutable array code, but not quite there. However, you let the mutable array do more work, since in the immutable array code you only eliminate odd multiples of the odd primes, but in the mutable array code, you mark all multiples of all primes. Changing the mutable array code to treat 2 specially, and only eliminate odd multiples of odd primes brings that down to 0.12 seconds too.

But, you're using range-checked array indexing, which is slow, and, since the validity of the indices is checked in the code itself, unnecessary. Changing that to using unsafeRead and unsafeWrite brings down the time for the immutable array to 0.09 seconds.

Then you have the problem that using

forM_ [x, y .. z]

uses boxed Ints (fortunately, GHC eliminates the list). Writing a loop yourself, so that only unboxed Int#s are used, the time goes down to 0.02 seconds.

{-# LANGUAGE MonoLocalBinds #-}
import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
import Data.Array.Base

primeSieve :: Int -> UArray Int Bool
primeSieve top = runSTUArray $ do
    a <- newArray (0,top) True
    unsafeWrite a 0 False
    unsafeWrite a 1 False
    let r = ceiling . sqrt $ fromIntegral top
        mark step idx
            | top < idx = return ()
            | otherwise = do
                unsafeWrite a idx False
                mark step (idx+step)
        sift p
            | r < p     = return a
            | otherwise = do
                prim <- unsafeRead a p
                when prim $ mark (2*p) (p*p)
                sift (p+2)
    mark 2 4
    sift 3

-- Return primes from sieve as list:
primesTo :: Int -> [Int]
primesTo top = [p | (p,True) <- assocs $ primeSieve top]

main :: IO ()
main = print .last $ primesTo 2000000

So, wrapping up, for a Sieve of Eratosthenes, you should use an array - not surprising, its efficiency depends on being able to step from one multiple to the next in short constant time.

You get very simple and straightforward code with immutable arrays, and that code performs decently for not too high limits (it gets relatively worse for higher limits, since the arrays are still copied and garbage-collected, but that's not too bad).

When you need better performance, you need mutable arrays. Writing efficient mutable array code is not entirely trivial, one has to know how the compiler translates the various constructs to choose the right one, and some would consider such code unidiomatic. But you can also use a library (disclaimer: I wrote it) that provides a fairly efficient implementation rather than writing it yourself.

answered Oct 14 '22 16:10

Daniel Fischer

Mutable array will always be the winner in terms of performance (and you really should've copied the version that works on odds only as a minimum; it should be the fastest of the three - also because it uses Int and not Integer).

For lists, tree-shaped merging incremental sieve should perform better than the one you show. You can always use it with takeWhile (< limit) if needed. I contend that it conveys the true nature of the sieve most clearly:

import Data.List (unfoldr)

primes :: [Int]         
primes = 2 : _Y ((3 :) . gaps 5 . _U . map (\p -> [p*p, p*p+2*p..]))

_Y g = g (_Y g)                                -- recursion 
_U ((x:xs):t) = (x :) . union xs . _U          -- ~= nub . sort . concat
                      . unfoldr (\(a:b:c) -> Just (union a b, c)) $ t

gaps k s@(x:xs) | k < x     = k : gaps (k+2) s   -- ~= [k,k+2..]\\s, when 
                | otherwise =     gaps (k+2) xs  --  k<=x && null(s\\[k,k+2..])

union a@(x:xs) b@(y:ys) = case compare x y of  -- ~= nub . sort .: (++)
         LT -> x : union  xs  b
         EQ -> x : union  xs ys
         GT -> y : union  a  ys

_U reimplements Data.List.Ordered.unionAll, and gaps 5 is (minus [5,7..]), fused for efficiency, with minus and union from the same package.

Of course nothing beats the brevity of Data.List.nubBy (((>1).).gcd) [2..] (but it's very slow).

To your 1st new question: not. It does find the multiples by counting up, as any true sieve should _{(although "minus" on lists is of course under-performant; the above improves on that by re-arranging a linear subtraction chain ((((xs-a)-b)-c)- ... ) into a subtraction of tree-folded additions, xs-(a+((b+c)+...))).}

answered Oct 14 '22 16:10

Will Ness

Related questions
                            
                                Why can't I leverage 4GB of RAM in my computer to process less than 2GB of information in C#?
                            
                                SQL Server won't use my index
                            
                                Are solid-state drives good enough to stop worrying about disk IO bottlenecks?
                            
                                What is the InnoDB equivalent of MyISAM's key_buffer_size?
                            
                                Delete a LINQ to SQL record without loading it first
                            
                                Can running 'cat' speed up subsequent file random access on a linux box?
                            
                                What is the fastest language that runs on the JVM
                            
                                Slow performance using anonymous functions in MATLAB... have others noticed this?
                            
                                Tips to solve problem #41 of project euler
                            
                                Will Python use all processors in thread mode?
                            
                                Codeigniter: Profiling and Performance
                            
                                How does the `primary key` keyword relate to clustered indexes in SQL Server?
                            
                                How to create a list with specific size of elements
                            
                                What's the best way to cycle through a large number of fixed position images in WebKit efficiently?
                            
                                Garbage Collector taking too much CPU Time
                            
                                Multithreaded, read-only access to a Vector. Copy or lock?
                            
                                Performance between importing entire namespace versus using alias in C#?
                            
                                C++ Inline methods for performance
                            
                                Will the size of a java class impact the performance of the application
                            
                                Is the performance/memory benefit of short nullified by downcasting?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With