Can any one help me? I'm trying to come up with a way to compute <pre class="prettyprint"><code>>>> sum_widths = sum(col.width for col in cols if not col.hide) </code></pre> and also count the number of items in this sum, without having to make two passes over <code>cols</code>. It seems unbelievable but after scanning the std-lib (built-in functions, itertools, functools, etc), I couldn't even find a function which would count the number of members in an iterable. I found the function <code>itertools.count</code>, which sounds like what I want, but It's really just a deceptively named <code>range</code> function. After a little thought I came up with the following (which is so simple that the lack of a library function may be excusable, except for its obtuseness): <pre class="prettyprint"><code>>>> visable_col_count = sum(col is col for col in cols if not col.hide) </code></pre> However, using these two functions requires two passes of the iterable, which just rubs me the wrong way. As an alternative, the following function does what I want: <pre class="prettyprint"><code>>>> def count_and_sum(iter): >>> count = sum = 0 >>> for item in iter: >>> count += 1 >>> sum += item >>> return count, sum </code></pre> The problem with this is that it takes 100 times as long (according to <code>timeit</code>) as the sum of a generator expression form. If anybody can come up with a simple one-liner which does what I want, please let me know (using Python 3.3). Edit 1 Lots of great ideas here, guys. Thanks to all who replied. It will take me a while to digest all these answers, but I will and I will try to pick one to check. Edit 2 I repeated the timings on my two humble suggestions (<code>count_and_sum</code> function and 2 separate <code>sum</code> functions) and discovered that my original timing was way off, probably due to an auto-scheduled backup process running in the background. I also timed most of the excellent suggestions given as answers here, all with the same model. Analysing these answers has been quite an education for me: new uses for <code>deque</code>, <code>enumerate</code> and <code>reduce</code> and first time for <code>count</code> and <code>accumulate</code>. Thanks to all! Here are the results (from my slow netbook) using the software I'm developing for display: <pre class="prettyprint"><code>┌───────────────────────────────────────────────────────┐ │ Count and Sum Timing │ ├──────────────────────────┬───────────┬────────────────┤ │ Method │Time (usec)│Time (% of base)│ ├──────────────────────────┼───────────┼────────────────┤ │count_and_sum (base) │ 7.2│ 100%│ │Two sums │ 7.5│ 104%│ │deque enumerate accumulate│ 7.3│ 101%│ │max enumerate accumulate │ 7.3│ 101%│ │reduce │ 7.4│ 103%│ │count sum │ 7.3│ 101%│ └──────────────────────────┴───────────┴────────────────┘ </code></pre> (I didn't time the complex and fold methods as being just too obscure, but thanks anyway.) Since there's very little difference in timing among all these methods I decided to use the <code>count_and_sum</code> function (with an explicit <code>for</code> loop) as being the most readable, explicit and simple (Python Zen) and it also happens to be the fastest! I wish I could accept one of these amazing answers as correct but they are all equally good though more or less obscure, so I'm just up-voting everybody and accepting my own answer as correct (<code>count_and_sum</code> function) since that's what I'm using. What was that about "There should be one-- and preferably only one --obvious way to do it."?

Using complex numbers <pre class="prettyprint"><code>z = [1, 2, 4, 5, 6] y = sum(x + 1j for x in z) sum_z, count_z = y.real, int(y.imag) print sum_z, count_z 18.0 5 </code></pre>

I don't know about speed, but this is kind of pretty: <pre class="prettyprint"><code>>>> from itertools import accumulate >>> it = range(10) >>> max(enumerate(accumulate(it), 1)) (10, 45) </code></pre>

Here's some timing data that might be of interest: <pre class="prettyprint"><code>import timeit setup = ''' import random, functools, itertools, collections x = [random.randint(0, 10) for _ in range(10**5)] def count_and_sum(it): c, s = 0, 0 for i in it: c += 1 s += i return c, s def two_pass(it): return sum(i for i in it), sum(True for i in it) def functional(it): return functools.reduce(lambda pair, x: (pair[0]+1, pair[1]+x), it, [0, 0]) def accumulator(it): return max(enumerate(itertools.accumulate(it), 1)) def complex(it): cpx = sum(x + 1j for x in it) return cpx.real, int(cpx.imag) def dequed(it): return collections.deque(enumerate(itertools.accumulate(it), 1), maxlen=1) ''' number = 100 for stmt in ['count_and_sum(x)', 'two_pass(x)', 'functional(x)', 'accumulator(x)', 'complex(x)', 'dequed(x)']: print('{:.4}'.format(timeit.timeit(stmt=stmt, setup=setup, number=number))) </code></pre> Result: <pre class="prettyprint"><code>3.404 # OP's one-pass method 3.833 # OP's two-pass method 8.405 # Timothy Shields's fold method 3.892 # DSM's accumulate-based method 4.946 # 1_CR's complex-number method 2.002 # M4rtini's deque-based modification of DSM's method </code></pre> Given these results, I'm not really sure how the OP is seeing a 100x slowdown with the one-pass method. Even if the data looks radically different from a list of random integers, that just shouldn't happen. Also, M4rtini's solution looks like the clear winner. <hr> To clarify, these results are in CPython 3.2.3. For a comparison to PyPy3, see James_pic's answer, which shows some serious gains from JIT compilation for some methods (also mentioned in a comment by M4rtini.

Need a fast way to count and sum an iterable in a single pass

Tags:

python

python-3.x

Can any one help me? I'm trying to come up with a way to compute

>>> sum_widths = sum(col.width for col in cols if not col.hide)

and also count the number of items in this sum, without having to make two passes over cols.

It seems unbelievable but after scanning the std-lib (built-in functions, itertools, functools, etc), I couldn't even find a function which would count the number of members in an iterable. I found the function itertools.count, which sounds like what I want, but It's really just a deceptively named range function.

After a little thought I came up with the following (which is so simple that the lack of a library function may be excusable, except for its obtuseness):

>>> visable_col_count = sum(col is col for col in cols if not col.hide)

However, using these two functions requires two passes of the iterable, which just rubs me the wrong way.

As an alternative, the following function does what I want:

>>> def count_and_sum(iter):
>>>     count = sum = 0
>>>     for item in iter:
>>>         count += 1
>>>         sum += item
>>>     return count, sum

The problem with this is that it takes 100 times as long (according to timeit) as the sum of a generator expression form.

If anybody can come up with a simple one-liner which does what I want, please let me know (using Python 3.3).

Edit 1

Lots of great ideas here, guys. Thanks to all who replied. It will take me a while to digest all these answers, but I will and I will try to pick one to check.

Edit 2

I repeated the timings on my two humble suggestions (count_and_sum function and 2 separate sum functions) and discovered that my original timing was way off, probably due to an auto-scheduled backup process running in the background.

I also timed most of the excellent suggestions given as answers here, all with the same model. Analysing these answers has been quite an education for me: new uses for deque, enumerate and reduce and first time for count and accumulate. Thanks to all!

Here are the results (from my slow netbook) using the software I'm developing for display:

┌───────────────────────────────────────────────────────┐
│                 Count and Sum Timing                  │
├──────────────────────────┬───────────┬────────────────┤
│          Method          │Time (usec)│Time (% of base)│
├──────────────────────────┼───────────┼────────────────┤
│count_and_sum (base)      │        7.2│            100%│
│Two sums                  │        7.5│            104%│
│deque enumerate accumulate│        7.3│            101%│
│max enumerate accumulate  │        7.3│            101%│
│reduce                    │        7.4│            103%│
│count sum                 │        7.3│            101%│
└──────────────────────────┴───────────┴────────────────┘

(I didn't time the complex and fold methods as being just too obscure, but thanks anyway.)

Since there's very little difference in timing among all these methods I decided to use the count_and_sum function (with an explicit for loop) as being the most readable, explicit and simple (Python Zen) and it also happens to be the fastest!

I wish I could accept one of these amazing answers as correct but they are all equally good though more or less obscure, so I'm just up-voting everybody and accepting my own answer as correct (count_and_sum function) since that's what I'm using.

What was that about "There should be one-- and preferably only one --obvious way to do it."?

226

asked Feb 12 '14 01:02

Don O'Donnell

4 Answers

Using complex numbers

z = [1, 2, 4, 5, 6]
y = sum(x + 1j for x in z)
sum_z, count_z = y.real, int(y.imag)
print sum_z, count_z
18.0 5

answered Oct 22 '22 07:10

iruvar

I don't know about speed, but this is kind of pretty:

>>> from itertools import accumulate
>>> it = range(10)
>>> max(enumerate(accumulate(it), 1))
(10, 45)

answered Oct 22 '22 07:10

DSM

Adaption of DSM's answer. using deque(... maxlen=1) to save memory use.

import itertools 
from collections import deque 
deque(enumerate(itertools.accumulate(x), 1), maxlen=1)

timing code in ipython:

import itertools , random
from collections import deque 

def count_and_sum(iter):
     count = sum = 0
     for item in iter:
         count += 1
         sum += item
     return count, sum

X = [random.randint(0, 10) for _ in range(10**7)]
%timeit count_and_sum(X)
%timeit deque(enumerate(itertools.accumulate(X), 1), maxlen=1)
%timeit (max(enumerate(itertools.accumulate(X), 1)))

results: now faster than OP's method

1 loops, best of 3: 1.08 s per loop
1 loops, best of 3: 659 ms per loop
1 loops, best of 3: 1.19 s per loop

answered Oct 22 '22 07:10

M4rtini

Here's some timing data that might be of interest:

import timeit

setup = '''
import random, functools, itertools, collections

x = [random.randint(0, 10) for _ in range(10**5)]

def count_and_sum(it):
    c, s = 0, 0
    for i in it:
        c += 1
        s += i
    return c, s

def two_pass(it):
    return sum(i for i in it), sum(True for i in it)

def functional(it):
    return functools.reduce(lambda pair, x: (pair[0]+1, pair[1]+x), it, [0, 0])

def accumulator(it):
    return max(enumerate(itertools.accumulate(it), 1))

def complex(it):
    cpx = sum(x + 1j for x in it)
    return cpx.real, int(cpx.imag)

def dequed(it):
    return collections.deque(enumerate(itertools.accumulate(it), 1), maxlen=1)

'''

number = 100
for stmt in ['count_and_sum(x)',
             'two_pass(x)',
             'functional(x)',
             'accumulator(x)',
             'complex(x)',
             'dequed(x)']:
    print('{:.4}'.format(timeit.timeit(stmt=stmt, setup=setup, number=number)))

Result:

3.404 # OP's one-pass method
3.833 # OP's two-pass method
8.405 # Timothy Shields's fold method
3.892 # DSM's accumulate-based method
4.946 # 1_CR's complex-number method
2.002 # M4rtini's deque-based modification of DSM's method

Given these results, I'm not really sure how the OP is seeing a 100x slowdown with the one-pass method. Even if the data looks radically different from a list of random integers, that just shouldn't happen.

Also, M4rtini's solution looks like the clear winner.

To clarify, these results are in CPython 3.2.3. For a comparison to PyPy3, see James_pic's answer, which shows some serious gains from JIT compilation for some methods (also mentioned in a comment by M4rtini.

answered Oct 22 '22 06:10

5 revs

Related questions
                            
                                Python MySQLdb TypeError: not all arguments converted during string formatting
                            
                                Get the string within brackets in Python
                            
                                python requests.get() returns improperly decoded text instead of UTF-8?
                            
                                Remove C and C++ comments using Python?
                            
                                Find the index of the first digit in a string
                            
                                make distutils in Python automatically find packages
                            
                                SQLAlchemy query where a column contains a substring
                            
                                Python psycopg2 not inserting into postgresql table
                            
                                Remove whitespace in Python using string.whitespace
                            
                                How do I increase the contrast of an image in Python OpenCV
                            
                                Concise vector adding in Python? [duplicate]
                            
                                Setting GOOGLE_APPLICATION_CREDENTIALS for BigQuery Python CLI
                            
                                Code for best fit straight line of a scatter plot in python
                            
                                Concatenation of many lists in Python [duplicate]
                            
                                Python reverse list
                            
                                Short (and useful) python snippets [closed]
                            
                                Printing variables in Python 3.4
                            
                                Anaconda not found in ZSh?
                            
                                Issues with installing python libraries on Windows : CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/anaconda/win-64
                            
                                python flask import error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Need a fast way to count and sum an iterable in a single pass

Tags:

python

python-3.x

Don O'Donnell

People also ask

4 Answers

iruvar

DSM

M4rtini

5 revs

Recent Activity

Donate For Us