I would like to run some tests for the null hypothesis that the times of events I have was created from a homogeneous Poisson process (see e.g. http://en.wikipedia.org/wiki/Poisson_process ). For a fixed number of events the times should therefore look like a sorted version of a uniform distribution in the appropriate range. There is an implementation of the Kolmogorov-Smirnov test at http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.stats.kstest.html but I can't see how to use it here as scipy.stats doesn't seem to know about Poisson processes.
As a simple example, this sample data should give a high p-value for any such test.
import random
nopoints = 100
max = 1000
points = sorted([random.randint(0,max) for j in xrange(nopoints)])
How can I make a sensible test for this problem?
From www.stat.wmich.edu/wang/667/classnotes/pp/pp.pdf I see
" REMARK 6.3 ( TESTING POISSON ) The above theorem may also be used to test the hypothesis that a given counting process is a Poisson process. This may be done by observing the process for a fixed time t. If in this time period we observed n occurrences and if the process is Poisson, then the unordered occurrence times would be independently and uniformly distributed on (0, t]. Hence, we may test if the process is Poisson by testing the hypothesis that the n occurrence times come from a uniform (0, t] population. This may be done by standard statistical procedures such as the Kolmogorov-Smirov test."
Warning: quickly written, some details are not verified
what's the appropriate estimator for exponential, degrees of freedom for chisquare test
based on lecture notes
The implications of homogeneity is not rejected with any of the three tests. Illustrates how to use kstest and chisquare test of scipy.stats
# -*- coding: utf-8 -*-
"""Tests for homogeneity of Poissson Process
Created on Tue Sep 17 13:50:25 2013
Author: Josef Perktold
"""
import numpy as np
from scipy import stats
# create an example dataset
nobs = 100
times_ia = stats.expon.rvs(size=nobs) # inter-arrival times
times_a = np.cumsum(times_ia) # arrival times
t_total = times_a.max()
# not used
#times_as = np.sorted(times_a)
#times_ia = np.diff(times_as)
bin_limits = np.array([ 0. , 0.5, 1. , 1.5, 2. , np.inf])
nfreq_ia, bins_ia = np.histogram(times_ia, bin_limits)
# implication: arrival times are uniform for fixed interval
# using times.max() means we don't really have a fixed interval
print stats.kstest(times_a, stats.uniform(0, t_total).cdf)
# implication: inter-arrival times are exponential
lambd = nobs * 1. / t_total
scale = 1. / lambd
expected_ia = np.diff(stats.expon.cdf(bin_limits, scale=scale)) * nobs
print stats.chisquare(nfreq_ia, expected_ia, ddof=1)
# implication: given total number of events, distribution of times is uniform
# binned version
n_mean_bin = 10
n_bins_a = nobs // 10
bin_limits_a = np.linspace(0, t_total+1e-7, n_bins_a + 1)
nfreq_a, bin_limits_a = np.histogram(times_a, bin_limits_a)
# expect uniform distributed over every subinterval
expected_a = np.ones(n_bins_a) / n_bins_a * nobs
print stats.chisquare(nfreq_a, expected_a, ddof=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With