Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomized algorithm not behaving as expected

I am implementing an approximate counting algorithm where we:

Maintain a counter X using log (log n) bits

  • Initialize X to 0

  • When an item arrives, increase X by 1 with probability (½)X

  • When the stream is over, output 2X − 1 so that E[2X]= n + 1

My implementation is as follows:

import System.Random

type Prob   = Double
type Tosses = Int

-- * for sake of simplicity we assume 0 <= p <= 1
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= 100*p, s')
  where (q,s') = randomR (1,100) s

toses :: Prob -> Tosses -> StdGen -> [(Bool,StdGen)]
toses _ 0 _ = []
toses p n s = let t@(b,s') = tos p s in t : toses p (pred n) s'

toses' :: Prob -> Tosses -> StdGen -> [Bool]
toses' p n = fmap fst . toses p n

morris :: StdGen -> [a] -> Int
morris s xs = go s xs 0 where
  go _ []     n = n
  go s (_:xs) n = go s' xs n' where
    (h,s') = tos (0.5^n) s 
    n'     = if h then succ n else n

main :: IO Int
main = do
  s <- newStdGen
  return $ morris s [1..10000]

The problem is that my X is always incorrect for any |stream| > 2, and it seems like for all StdGen and |stream| > 1000, X = 7

I tested the same algorithm in Matlab and it works there, so I assume it's either

  1. an issue with my random number generator, or

  2. raising 1/2 to a large n in Double

Please suggest a path forward?

like image 473
xiaolingxiao Avatar asked Dec 06 '15 04:12

xiaolingxiao


People also ask

What is the expected running time of randomized algorithm?

If an algorithm is randomized, its running time is also random, which means we can define the expected value of its running time. A well-known example is Quicksort: if we pick the pivots at random, we can prove that its expected running time becomes O(n log n), even though its worst case running time remains O(n^2).

Why do we analyze the expected running time of a randomized algorithm and not its worst case running time?

An algorithm is randomized if its behavior is determined not only by its input but also by values produced by a random-number generator. That is why you analyze what is expected. A worst case analysis is on the input only.

What are we typically analyzing with randomized algorithms?

These algorithms are typically analysed for expected worst case. To compute expected time taken in worst case, all possible values of the used random variable needs to be considered in worst case and time taken by every possible value needs to be evaluated.

What is a randomized algorithm and what is its advantage?

A randomized algorithm is a technique that uses a source of randomness as part of its logic. It is typically used to reduce either the running time, or time complexity; or the memory used, or space complexity, in a standard algorithm.


1 Answers

The problem is actually very simple: with randomR (1,100) you preclude values within the first percent, so you have a complete cutoff at high powers of 1/2 (which all lie in that small interval). Actually a general thing: ranges should start at zero, not at one, unless there's a specific reason.

But why even use a range of 100 in the first place? I'd just make it

tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= p, s')
  where (q,s') = randomR (0,1) s

I know, Matlab gets this wrong all over the place. Just one of the many horrible things about that language.


Unrelated to your problem: as chi remarked this kind of code looks a lot nicer if you use a suitable random monad, instead of manually passing around StdGens.

import Data.Random
import Data.Random.Source.Std

type Prob   = Double

tos :: Prob -> RVar Bool
tos p = do
  q <- uniform 0 1
  return $ q <= p

morris :: [a] -> RVar Int
morris xs = go xs 0 where
  go []     n = return n
  go (_:xs) n = do
    h <- tos (0.5^n)
    go xs $ if h then succ n else n

morrisTest :: Int -> IO Int
morrisTest n = do
  runRVar (morris [1..n]) StdRandom
like image 64
leftaroundabout Avatar answered Oct 13 '22 21:10

leftaroundabout