Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test for Poisson process

I would like to run some tests for the null hypothesis that the times of events I have was created from a homogeneous Poisson process (see e.g. http://en.wikipedia.org/wiki/Poisson_process ). For a fixed number of events the times should therefore look like a sorted version of a uniform distribution in the appropriate range. There is an implementation of the Kolmogorov-Smirnov test at http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.stats.kstest.html but I can't see how to use it here as scipy.stats doesn't seem to know about Poisson processes.

As a simple example, this sample data should give a high p-value for any such test.

import random
nopoints = 100
max = 1000

points = sorted([random.randint(0,max) for j in xrange(nopoints)])

How can I make a sensible test for this problem?

From www.stat.wmich.edu/wang/667/classnotes/pp/pp.pdf‎ I see

" REMARK 6.3 ( TESTING POISSON ) The above theorem may also be used to test the hypothesis that a given counting process is a Poisson process. This may be done by observing the process for a fixed time t. If in this time period we observed n occurrences and if the process is Poisson, then the unordered occurrence times would be independently and uniformly distributed on (0, t]. Hence, we may test if the process is Poisson by testing the hypothesis that the n occurrence times come from a uniform (0, t] population. This may be done by standard statistical procedures such as the Kolmogorov-Smirov test."

like image 767
graffe Avatar asked Oct 03 '22 22:10

graffe


1 Answers

Warning: quickly written, some details are not verified

what's the appropriate estimator for exponential, degrees of freedom for chisquare test

based on lecture notes

The implications of homogeneity is not rejected with any of the three tests. Illustrates how to use kstest and chisquare test of scipy.stats

# -*- coding: utf-8 -*-
"""Tests for homogeneity of Poissson Process

Created on Tue Sep 17 13:50:25 2013

Author: Josef Perktold
"""

import numpy as np
from scipy import stats

# create an example dataset
nobs = 100
times_ia = stats.expon.rvs(size=nobs) # inter-arrival times
times_a = np.cumsum(times_ia) # arrival times
t_total = times_a.max()

# not used
#times_as = np.sorted(times_a)
#times_ia = np.diff(times_as)

bin_limits = np.array([ 0. ,  0.5,  1. ,  1.5,  2. ,  np.inf])
nfreq_ia, bins_ia = np.histogram(times_ia, bin_limits)


# implication: arrival times are uniform for fixed interval
# using times.max() means we don't really have a fixed interval
print stats.kstest(times_a, stats.uniform(0, t_total).cdf)

# implication: inter-arrival times are exponential
lambd = nobs * 1. / t_total
scale = 1. / lambd

expected_ia = np.diff(stats.expon.cdf(bin_limits, scale=scale)) * nobs
print stats.chisquare(nfreq_ia, expected_ia, ddof=1)

# implication: given total number of events, distribution of times is uniform
# binned version
n_mean_bin = 10
n_bins_a = nobs // 10
bin_limits_a = np.linspace(0, t_total+1e-7, n_bins_a + 1)
nfreq_a, bin_limits_a = np.histogram(times_a, bin_limits_a)
# expect uniform distributed over every subinterval
expected_a = np.ones(n_bins_a) / n_bins_a * nobs
print stats.chisquare(nfreq_a, expected_a, ddof=1)
like image 133
Josef Avatar answered Oct 07 '22 18:10

Josef