Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate random numbers distributed by Zipf

The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu Scientific Library) provide an implementation to generate random numbers using this distribution. I have not found a (trustworthy) implementation using the common search engines.

How can random numbers that are distributed according to the Zipf distribution by using a U(0,1) random generator, e.g. the Mersenne twister?

like image 209
dmeister Avatar asked Sep 02 '09 10:09

dmeister


4 Answers

Here's a Python Zipf-like distribution generator for n items with parameter alpha >= 0:

import random 
import bisect 
import math 

class ZipfGenerator: 

    def __init__(self, n, alpha): 
        # Calculate Zeta values from 1 to n: 
        tmp = [1. / (math.pow(float(i), alpha)) for i in range(1, n+1)] 
        zeta = reduce(lambda sums, x: sums + [sums[-1] + x], tmp, [0]) 

        # Store the translation map: 
        self.distMap = [x / zeta[-1] for x in zeta] 

    def next(self): 
        # Take a uniform 0-1 pseudo-random value: 
        u = random.random()  

        # Translate the Zipf variable: 
        return bisect.bisect(self.distMap, u) - 1
like image 106
stanga Avatar answered Sep 19 '22 15:09

stanga


zipfR is a free and open source library implemented with R. VGAM is another R package that also implements Zipf.

It's also worth noting that the Gnu Scientific Library has an implementation of the Pareto distribution which is effectively the continuous analogue of the discrete Zipf distribution.

Also, the Zeta distribution is equivalent to Zipf for infinite N. The GSL has an implementation of the Riemann zeta function, so you could use that to construct the distribution yourself.

like image 29
ire_and_curses Avatar answered Sep 18 '22 15:09

ire_and_curses


numpy.random.zipf generates Zipf samples using python.

like image 22
Yuval F Avatar answered Sep 20 '22 15:09

Yuval F


A very efficient algorithm to generate Zipf distributed random variates was recently developed for the next versions (>= 3.6) of the Apache Commons Math library (see code here). It makes use of rejection-inversion sampling and also works for exponents less than 1. It does not require precalculating the CDF and keeping it in memory. Furthermore, the costs for generating one sample are constant and do not increase with the number of items.

like image 37
otmar Avatar answered Sep 19 '22 15:09

otmar