compute mean in python for a generator

Question

I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.

The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?

It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.

here's an example:

import numpy 

def my_mean(values):
    n = 0
    Sum = 0.0
    try:
        while True:
            Sum += next(values)
            n += 1
    except StopIteration: pass
    return float(Sum)/n

X = [k for k in range(1,7)]
Y = (k for k in range(1,7))

print numpy.mean(X)
print my_mean(Y)

these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.

I really like the idea of working with generators, but details like this seem to spoil things.

Erik · Accepted Answer

In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.

The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.

def mean(data):
    n = 0
    mean = 0.0
 
    for x in data:
        n += 1
        mean += (x - mean)/n

    if n < 1:
        return float('nan')
    else:
        return mean

I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.

Mark Ransom · Answer

Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.

def my_mean(values):
    n = 0
    Sum = 0.0
    for v in values:
        Sum += v
        n += 1
    return Sum / n

jfs · Answer

def my_mean(values):
    total = 0
    for n, v in enumerate(values, 1):
        total += v
    return total / n

print my_mean(X)
print my_mean(Y)

There is statistics.mean() in Python 3.4 but it calls list() on the input:

def mean(data):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    return _sum(data)/n

where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

Jimmy · Answer

The old-fashioned way to do it:

def my_mean(values):
   sum, n = 0, 0
   for x in values:
      sum += x
      n += 1
   return float(sum)/n

compute mean in python for a generator

Tags:

python

generator

mean

nick maxwell

4 Answers

Erik

Mark Ransom

jfs

Jimmy

Recent Activity

Donate For Us

compute mean in python for a generator

Tags:

python

generator

mean

nick maxwell

4 Answers

Erik

Mark Ransom

jfs

Jimmy

Related questions

Recent Activity

Donate For Us