There are many ways to write a Python program that computes a histogram. By histogram, I mean a function that counts the occurrence of objects in an <code>iterable</code> and outputs the counts in a dictionary. For example: <pre class="prettyprint"><code>>>> L = 'abracadabra' >>> histogram(L) {'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2} </code></pre> One way to write this function is: <pre class="prettyprint"><code>def histogram(L): d = {} for x in L: if x in d: d[x] += 1 else: d[x] = 1 return d </code></pre> Are there more concise ways of writing this function? If we had dictionary comprehensions in Python, we could write: <pre class="prettyprint"><code>>>> { x: L.count(x) for x in set(L) } </code></pre> but since Python 2.6 doesn't have them, we have to write: <pre class="prettyprint"><code>>>> dict([(x, L.count(x)) for x in set(L)]) </code></pre> Although this approach may be readable, it is not efficient: L is walked-through multiple times. Furthermore, this won't work for single-life generators; the function should work equally well for iterator generators such as: <pre class="prettyprint"><code>def gen(L): for x in L: yield x </code></pre> We might try to use the <code>reduce</code> function (R.I.P.): <pre class="prettyprint"><code>>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong! </code></pre> Oops, this does not work: the key name is <code>'x'</code>, not <code>x</code>. :( I ended with: <pre class="prettyprint"><code>>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {}) </code></pre> (In Python 3, we would have to write <code>list(d.items())</code> instead of <code>d.items()</code>, but it's hypothethical, since there is no <code>reduce</code> there.) Please beat me with a better, more readable one-liner! ;)

It's kinda cheaty to import modules for oneliners, so here's a oneliner that is O(n) and works at least as far back as Python2.4 <pre class="prettyprint"><code>>>> f=lambda s,d={}:([d.__setitem__(i,d.get(i,0)+1) for i in s],d)[-1] >>> f("ABRACADABRA") {'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1} </code></pre> And if you think <code>__</code> methods are hacky, you can always do this <pre class="prettyprint"><code>>>> f=lambda s,d=lambda:0:vars(([setattr(d,i,getattr(d,i,0)+1) for i in s],d)[-1]) >>> f("ABRACADABRA") {'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1} </code></pre> :)

<pre class="prettyprint"><code>$d{$_} += 1 for split //, 'abracadabra'; </code></pre>

python histogram one-liner

Tags:

python

reduce

histogram

counting

There are many ways to write a Python program that computes a histogram.

By histogram, I mean a function that counts the occurrence of objects in an iterable and outputs the counts in a dictionary. For example:

>>> L = 'abracadabra'
>>> histogram(L)
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}

One way to write this function is:

def histogram(L):
    d = {}
    for x in L:
        if x in d:
            d[x] += 1
        else:
            d[x] = 1
    return d

Are there more concise ways of writing this function?

If we had dictionary comprehensions in Python, we could write:

>>> { x: L.count(x) for x in set(L) }

but since Python 2.6 doesn't have them, we have to write:

>>> dict([(x, L.count(x)) for x in set(L)])

Although this approach may be readable, it is not efficient: L is walked-through multiple times. Furthermore, this won't work for single-life generators; the function should work equally well for iterator generators such as:

def gen(L):
    for x in L:
        yield x

We might try to use the reduce function (R.I.P.):

>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong!

Oops, this does not work: the key name is 'x', not x. :(

I ended with:

>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {})

(In Python 3, we would have to write list(d.items()) instead of d.items(), but it's hypothethical, since there is no reduce there.)

Please beat me with a better, more readable one-liner! ;)

342

asked May 20 '10 01:05

mykhal

3 Answers

Python 3.x does have reduce, you just have to do a from functools import reduce. It also has "dict comprehensions", which have exactly the syntax in your example.

Python 2.7 and 3.x also have a Counter class which does exactly what you want:

from collections import Counter
cnt = Counter("abracadabra")

In Python 2.6 or earlier, I'd personally use a defaultdict and do it in 2 lines:

d = defaultdict(int)
for x in xs: d[x] += 1

That's clean, efficient, Pythonic, and much easier for most people to understand than anything involving reduce.

160

answered Sep 21 '22 03:09

Eli Courtwright

It's kinda cheaty to import modules for oneliners, so here's a oneliner that is O(n) and works at least as far back as Python2.4

>>> f=lambda s,d={}:([d.__setitem__(i,d.get(i,0)+1) for i in s],d)[-1]
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

And if you think __ methods are hacky, you can always do this

>>> f=lambda s,d=lambda:0:vars(([setattr(d,i,getattr(d,i,0)+1) for i in s],d)[-1])
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}

answered Sep 25 '22 03:09

John La Rooy

$d{$_} += 1 for split //, 'abracadabra';

answered Sep 21 '22 03:09

perl

Related questions
                            
                                Python: list of lists
                            
                                -bash: ./manage.py: Permission denied
                            
                                Python causing: IOError: [Errno 28] No space left on device: '../results/32766.html' on disk with lots of space
                            
                                TypeError: Required argument 'outImg' (pos 6) not found
                            
                                In django, how do I call the subcommand 'syncdb' from the initialization script?
                            
                                Reversible hash function?
                            
                                Are there advantages to use the Python/C interface instead of Cython?
                            
                                Is it possible to have multiple PyPlot windows? Or am I limited to subplots?
                            
                                Move files between two AWS S3 buckets using boto3
                            
                                Install py2exe for python 2.7 over pip: this package requires Python 3.3 or later
                            
                                Setting Background color to transparent in Plotly plots
                            
                                SyntaxError: Generator expression must be parenthezised / python manage.py migrate
                            
                                Can you use a string to instantiate a class?
                            
                                How to copy InMemoryUploadedFile object to disk
                            
                                psycopg2 insert python dictionary as json
                            
                                Python Convert Back Slashes to forward slashes
                            
                                Can I get the matrix determinant using Numpy?
                            
                                cannot urllib.urlencode a URL in python
                            
                                pip freeze creates some weird path instead of the package version
                            
                                Determining version of easy_install/setuptools

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With