I am implementing an approximate counting algorithm where we:
Maintain a counter X using log (log n) bits
Initialize X to 0
When an item arrives, increase X by 1 with probability (½)X
When the stream is over, output 2X − 1 so that E[2X]= n + 1
My implementation is as follows:
import System.Random
type Prob = Double
type Tosses = Int
-- * for sake of simplicity we assume 0 <= p <= 1
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= 100*p, s')
where (q,s') = randomR (1,100) s
toses :: Prob -> Tosses -> StdGen -> [(Bool,StdGen)]
toses _ 0 _ = []
toses p n s = let t@(b,s') = tos p s in t : toses p (pred n) s'
toses' :: Prob -> Tosses -> StdGen -> [Bool]
toses' p n = fmap fst . toses p n
morris :: StdGen -> [a] -> Int
morris s xs = go s xs 0 where
go _ [] n = n
go s (_:xs) n = go s' xs n' where
(h,s') = tos (0.5^n) s
n' = if h then succ n else n
main :: IO Int
main = do
s <- newStdGen
return $ morris s [1..10000]
The problem is that my X is always incorrect for any |stream| > 2
, and it seems like for all StdGen
and |stream| > 1000
, X = 7
I tested the same algorithm in Matlab and it works there, so I assume it's either
an issue with my random number generator, or
raising 1/2 to a large n in Double
Please suggest a path forward?
If an algorithm is randomized, its running time is also random, which means we can define the expected value of its running time. A well-known example is Quicksort: if we pick the pivots at random, we can prove that its expected running time becomes O(n log n), even though its worst case running time remains O(n^2).
An algorithm is randomized if its behavior is determined not only by its input but also by values produced by a random-number generator. That is why you analyze what is expected. A worst case analysis is on the input only.
These algorithms are typically analysed for expected worst case. To compute expected time taken in worst case, all possible values of the used random variable needs to be considered in worst case and time taken by every possible value needs to be evaluated.
A randomized algorithm is a technique that uses a source of randomness as part of its logic. It is typically used to reduce either the running time, or time complexity; or the memory used, or space complexity, in a standard algorithm.
The problem is actually very simple: with randomR (1,100)
you preclude values within the first percent, so you have a complete cutoff at high powers of 1/2 (which all lie in that small interval). Actually a general thing: ranges should start at zero, not at one†, unless there's a specific reason.
But why even use a range of 100 in the first place? I'd just make it
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= p, s')
where (q,s') = randomR (0,1) s
†I know, Matlab gets this wrong all over the place. Just one of the many horrible things about that language.
Unrelated to your problem: as chi remarked this kind of code looks a lot nicer if you use a suitable random monad, instead of manually passing around StdGen
s.
import Data.Random
import Data.Random.Source.Std
type Prob = Double
tos :: Prob -> RVar Bool
tos p = do
q <- uniform 0 1
return $ q <= p
morris :: [a] -> RVar Int
morris xs = go xs 0 where
go [] n = return n
go (_:xs) n = do
h <- tos (0.5^n)
go xs $ if h then succ n else n
morrisTest :: Int -> IO Int
morrisTest n = do
runRVar (morris [1..n]) StdRandom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With