Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I fit a curve to a histogram distribution?

Someone asked me a question via e-mail about integer partitions the other day (as I had released a Perl module, Integer::Partition, to generate them), that I was unable to answer.

Background: here are all the integer partitions of 7 (the sum of each row equals 7).

7
6 1
5 2
5 1 1
4 3
4 2 1
4 1 1 1
3 3 1
3 2 2
3 2 1 1
3 1 1 1 1
2 2 2 1
2 2 1 1 1
2 1 1 1 1 1
1 1 1 1 1 1 1

Now if we look at the lengths of each partition and count how many there are of each length:

1 1
2 3
3 4
4 3
5 2
6 1
7 1

... we see one partition has a length of 1 (7), one has a length of 7 (1 1 1 1 1 1 1). There are 4 partitions that have a length of 3: (5 1 1), (4 2 1), (3 3 1), (3 2 2).

For larger numbers of N, if you graph the distribution of partition lengths, an asymetric curve emerges, skewed towards the origin. If you're curious, graph the following partition length counts for N=40.

1 20 133 478 1115 1945 2738 3319 3589 3590 3370 3036 2637 2241 1861 1530 1236 995 790 627 490 385 297 231 176 135 101 77 56 42 30 22 15 11 7 5 3 2 1 1

If you're interested in generating these distribution counts, here's the code I used:

#! /usr/local/bin/perl

use strict;
use warnings;

use Integer::Partition;

my $n = shift || 1;

while (1) {
    my $start = time;
    my $i = Integer::Partition->new($n);
    my %size;
    while (my $p = $i->next) {
        $size{scalar @$p}++;
    }

    open my $out, '>>', "bucket-count.out";
    for my $s (sort {$a <=> $b} keys %size) {
        print $out "$n\t$s\t$size{$s}\n";
    }
    close $out;
    my $delta = time - $start;
    print "$n\t$delta secs\n";
    ++$n;
}

(note: on my computer, N=90 takes about 10 minutes to generate).

So my question is: what equation can be used to match the observed distribution curve? Is it a Gauss (can a Gaussian distribution be asymetric?) or Poisson distribution, or something else?

How do I solve it for N? If I remember my maths from high-school, I can determine the peak by solving when the derivative intersects 0. How do I produce the derivative? I've searched the web but all I get back are abstruse mathematical papers. I just need some code :)

like image 660
dland Avatar asked Nov 05 '22 22:11

dland


1 Answers

I think a poisson distribution is a reasonable estimate. Given that presumption your problem now turns to one of fnding the maximum frequency, k, given N. I think you have two approaches:

  1. figure it out from a mathematical standpoint (I would start by looking at combinatorics, but that may not be a particularly good steer)
  2. presume it is poisson and measure the peak for any given N, as you have above.

Once you have the peak (k), estimating lambda should be straightforward (try a few out) and you have your curve.

Another approach is to work the whole thing up in python and ask on the numpy or scipy boards :-)

HTH

like image 148
Simon Avatar answered Nov 14 '22 02:11

Simon