Anything wrong with my Fisher-Yates shuffle?

Tags:

Aware that when something seems too good to be true it usually is, I figured I would pose this question to hopefully flush out any gremlins. I reviewed the few related threads that I could find, but still my question lingers.

I am relatively new to Haskell, and in my experimentation I coded up a basic Fisher-Yates shuffle as shown below.

shuffle :: RandomGen g => [a] -> g -> ([a],g)
shuffle [] g0 = ([],g0)
shuffle [x] g0 = ([x],g0)
shuffle xs g0 = (x:newtail,g2)
  where (i,g1) = randomR (0, length $ tail xs) g0
        (xs1,x:xs2) = splitAt i xs
        (newtail,g2) = shuffle (xs1++xs2) g1

This implementation of course uses beaucoup memory for large lists, but it's fast - on my laptop avg 5s for 30M ints vs. Std C++ shuffle at 2.3s). In fact, it is much faster than other Haskell implementations have found elsewhere.(e.g., http://www.haskell.org/haskellwiki/Random_shuffle)

Given other Haskell shuffles I've seen are both more complicated and slower, I am wondering whether the speedup/simplicity is simply my reward for being a unapologetic memory hog, or if I have missed some tiny but crucial detail that makes my algorithm biased. I have not tested extensively, but a preliminary look seems to show a uniform distribution of permutations.

I would appreciate the assessment of more eyes with more Haskell and/or shuffling experience. Many thanks in advance to all who take the time to reply.

665

asked Apr 26 '13 18:04

Tientuinë

1 Answers

Let's do some proper benchmarking. Here's some code, with your shuffle renamed to shuffle1, and my personal favorite variant thrown in as shuffle2.

import System.Random

import Control.Monad

import Control.Monad.ST.Strict
import Data.STRef.Strict

import Data.Vector.Mutable

import Prelude as P

import Criterion.Main


shuffle1 :: RandomGen g => [a] -> g -> ([a], g)
shuffle1 [] g0 = ([],g0)
shuffle1 [x] g0 = ([x],g0)
shuffle1 xs g0 = (x:newtail,g2)
  where (i,g1) = randomR (0, P.length $ P.tail xs) g0
        (xs1,x:xs2) = P.splitAt i xs
        (newtail,g2) = shuffle1 (xs1++xs2) g1


shuffle2 :: RandomGen g => [a] -> g -> ([a], g)
shuffle2 xs g0 = runST $ do
    let l = P.length xs
    v <- new l
    sequence_ $ zipWith (unsafeWrite v) [0..] xs

    let loop g i | i <= 1 = return g
                 | otherwise = do
            let i' = i - 1
                (j, g') = randomR (0, i') g
            unsafeSwap v i' j
            loop g' i'

    gFinal <- loop g0 l
    shuffled <- mapM (unsafeRead v) [0 .. l - 1]
    return (shuffled, gFinal)


main = do
    let s1 x = fst $ shuffle1 x g0
        s2 x = fst $ shuffle2 x g0
        arr = [0..1000] :: [Int]
        g0 = mkStdGen 0
    -- make sure these values are evaluated before the benchmark starts
    print (g0, arr)

    defaultMain [bench "shuffle1" $ nf s1 arr, bench "shuffle2" $ nf s2 arr]

And so, let's see some results:

carl@ubuntu:~/hask$ ghc -O2 shuffle.hs
[1 of 1] Compiling Main             ( shuffle.hs, shuffle.o )
Linking shuffle ...
carl@ubuntu:~/hask$ ./shuffle 
(1 1,[0, .. <redacted for brevity>])
warming up
estimating clock resolution...
mean is 5.762060 us (160001 iterations)
found 4887 outliers among 159999 samples (3.1%)
  4751 (3.0%) high severe
estimating cost of a clock call...
mean is 42.13314 ns (43 iterations)

benchmarking shuffle1
mean: 10.95922 ms, lb 10.92317 ms, ub 10.99903 ms, ci 0.950
std dev: 193.8795 us, lb 168.6842 us, ub 244.6648 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 10.396%
variance is moderately inflated by outliers

benchmarking shuffle2
mean: 256.9394 us, lb 255.5414 us, ub 258.7409 us, ci 0.950
std dev: 8.042766 us, lb 6.460785 us, ub 12.28447 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
  1 (1.0%) high severe
variance introduced by outliers: 26.750%
variance is moderately inflated by outliers

Ok, my system is really noisy, and shouldn't be used for serious benchmarking of things with similar numbers. But that hardly matters here. shuffle2 is approximately 40x faster than shuffle1 on a list with 1001 elements. Due to the differences between O(n) and O(n^2), that will only increase in with larger lists. I'm certain that whatever your test code was timing, it wasn't the shuffle algorithm.

Actually, I have a guess. Your version is lazy enough to return results incrementally. 5 seconds is a plausible period of time for getting the first few results, if you never touch the generator after the call to it. Maybe that's what's going on in your timing.

115

answered Sep 29 '22 11:09

Carl

Related questions
                            
                                How to check for haskell package versions in ./configure?
                            
                                Haskell Peano Numbers
                            
                                Are there monads that can be used like an automaton?
                            
                                Why does gnu readline require me to hit control c twice?
                            
                                Haskell Arrows inside Tuples
                            
                                Haskell datatype to Java(OO)
                            
                                Updating outer monad only in monad transformer
                            
                                Haskell: TVar: Preventing starvation
                            
                                How to cabalize haskell packages with multiple authors
                            
                                Haskell More efficient way to parse file of lines of digits
                            
                                Haskell: "Could not deduce" error with runST
                            
                                Function composition hint
                            
                                Comparing speed of Haskell and C for the computation of primes
                            
                                Dynamic Compilation in Haskell GHC API Error
                            
                                Converting a type level natural number into a regular number
                            
                                Link to static file in Yesod that doesn't have Haskell identifier
                            
                                Composition of partial lenses
                            
                                iterate list creation from IO Int, How to?
                            
                                Is there an elegant way to have functions return functions of the same type (in a tuple)
                            
                                Parsing partial json objects with aeson in haskell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Anything wrong with my Fisher-Yates shuffle?

Tags:

shuffle

haskell

Tientuinë

People also ask

1 Answers

Carl

Recent Activity

Donate For Us