Aware that when something seems too good to be true it usually is, I figured I would pose this question to hopefully flush out any gremlins. I reviewed the few related threads that I could find, but still my question lingers.
I am relatively new to Haskell, and in my experimentation I coded up a basic Fisher-Yates shuffle as shown below.
shuffle :: RandomGen g => [a] -> g -> ([a],g)
shuffle [] g0 = ([],g0)
shuffle [x] g0 = ([x],g0)
shuffle xs g0 = (x:newtail,g2)
where (i,g1) = randomR (0, length $ tail xs) g0
(xs1,x:xs2) = splitAt i xs
(newtail,g2) = shuffle (xs1++xs2) g1
This implementation of course uses beaucoup memory for large lists, but it's fast - on my laptop avg 5s for 30M ints vs. Std C++ shuffle at 2.3s). In fact, it is much faster than other Haskell implementations have found elsewhere.(e.g., http://www.haskell.org/haskellwiki/Random_shuffle)
Given other Haskell shuffles I've seen are both more complicated and slower, I am wondering whether the speedup/simplicity is simply my reward for being a unapologetic memory hog, or if I have missed some tiny but crucial detail that makes my algorithm biased. I have not tested extensively, but a preliminary look seems to show a uniform distribution of permutations.
I would appreciate the assessment of more eyes with more Haskell and/or shuffling experience. Many thanks in advance to all who take the time to reply.
The Fisher–Yates shuffle, as implemented by Durstenfeld, is an in-place shuffle. That is, given a preinitialized array, it shuffles the elements of the array in place, rather than producing a shuffled copy of the array. This can be an advantage if the array to be shuffled is large.
Shuffle Array using Random Class We can iterate through the array elements in a for loop. Then, we use the Random class to generate a random index number. Then swap the current index element with the randomly generated index element. At the end of the for loop, we will have a randomly shuffled array.
First, the algorithm spreads all the songs from the same artist all over the playlist as evenly as possible. Then, all the songs of all artists are collected an ordered by position.
Using shuffle() method from Random library to shuffle the given array. Here we are using shuffle method from the built-in random module to shuffle the entire array at once.
Let's do some proper benchmarking. Here's some code, with your shuffle renamed to shuffle1
, and my personal favorite variant thrown in as shuffle2
.
import System.Random
import Control.Monad
import Control.Monad.ST.Strict
import Data.STRef.Strict
import Data.Vector.Mutable
import Prelude as P
import Criterion.Main
shuffle1 :: RandomGen g => [a] -> g -> ([a], g)
shuffle1 [] g0 = ([],g0)
shuffle1 [x] g0 = ([x],g0)
shuffle1 xs g0 = (x:newtail,g2)
where (i,g1) = randomR (0, P.length $ P.tail xs) g0
(xs1,x:xs2) = P.splitAt i xs
(newtail,g2) = shuffle1 (xs1++xs2) g1
shuffle2 :: RandomGen g => [a] -> g -> ([a], g)
shuffle2 xs g0 = runST $ do
let l = P.length xs
v <- new l
sequence_ $ zipWith (unsafeWrite v) [0..] xs
let loop g i | i <= 1 = return g
| otherwise = do
let i' = i - 1
(j, g') = randomR (0, i') g
unsafeSwap v i' j
loop g' i'
gFinal <- loop g0 l
shuffled <- mapM (unsafeRead v) [0 .. l - 1]
return (shuffled, gFinal)
main = do
let s1 x = fst $ shuffle1 x g0
s2 x = fst $ shuffle2 x g0
arr = [0..1000] :: [Int]
g0 = mkStdGen 0
-- make sure these values are evaluated before the benchmark starts
print (g0, arr)
defaultMain [bench "shuffle1" $ nf s1 arr, bench "shuffle2" $ nf s2 arr]
And so, let's see some results:
carl@ubuntu:~/hask$ ghc -O2 shuffle.hs
[1 of 1] Compiling Main ( shuffle.hs, shuffle.o )
Linking shuffle ...
carl@ubuntu:~/hask$ ./shuffle
(1 1,[0, .. <redacted for brevity>])
warming up
estimating clock resolution...
mean is 5.762060 us (160001 iterations)
found 4887 outliers among 159999 samples (3.1%)
4751 (3.0%) high severe
estimating cost of a clock call...
mean is 42.13314 ns (43 iterations)
benchmarking shuffle1
mean: 10.95922 ms, lb 10.92317 ms, ub 10.99903 ms, ci 0.950
std dev: 193.8795 us, lb 168.6842 us, ub 244.6648 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 10.396%
variance is moderately inflated by outliers
benchmarking shuffle2
mean: 256.9394 us, lb 255.5414 us, ub 258.7409 us, ci 0.950
std dev: 8.042766 us, lb 6.460785 us, ub 12.28447 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
1 (1.0%) high severe
variance introduced by outliers: 26.750%
variance is moderately inflated by outliers
Ok, my system is really noisy, and shouldn't be used for serious benchmarking of things with similar numbers. But that hardly matters here. shuffle2
is approximately 40x faster than shuffle1
on a list with 1001 elements. Due to the differences between O(n) and O(n^2), that will only increase in with larger lists. I'm certain that whatever your test code was timing, it wasn't the shuffle algorithm.
Actually, I have a guess. Your version is lazy enough to return results incrementally. 5 seconds is a plausible period of time for getting the first few results, if you never touch the generator after the call to it. Maybe that's what's going on in your timing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With