There are many ways to write a Python program that computes a histogram.
By histogram, I mean a function that counts the occurrence of objects in an iterable
and outputs the counts in a dictionary. For example:
>>> L = 'abracadabra'
>>> histogram(L)
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}
One way to write this function is:
def histogram(L):
d = {}
for x in L:
if x in d:
d[x] += 1
else:
d[x] = 1
return d
Are there more concise ways of writing this function?
If we had dictionary comprehensions in Python, we could write:
>>> { x: L.count(x) for x in set(L) }
but since Python 2.6 doesn't have them, we have to write:
>>> dict([(x, L.count(x)) for x in set(L)])
Although this approach may be readable, it is not efficient: L is walked-through multiple times. Furthermore, this won't work for single-life generators; the function should work equally well for iterator generators such as:
def gen(L):
for x in L:
yield x
We might try to use the reduce
function (R.I.P.):
>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong!
Oops, this does not work: the key name is 'x'
, not x
. :(
I ended with:
>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {})
(In Python 3, we would have to write list(d.items())
instead of d.items()
, but it's hypothethical, since there is no reduce
there.)
Please beat me with a better, more readable one-liner! ;)
We can achieve this by increasing the number of bins, which is essentially the number of classes the histogram divides the data into. More bins will make the histogram smoother.
To normalize a histogram in Python, we can use hist() method. In normalized bar, the area underneath the plot should be 1.
Python 3.x does have reduce
, you just have to do a from functools import reduce
. It also has "dict comprehensions", which have exactly the syntax in your example.
Python 2.7 and 3.x also have a Counter class which does exactly what you want:
from collections import Counter
cnt = Counter("abracadabra")
In Python 2.6 or earlier, I'd personally use a defaultdict and do it in 2 lines:
d = defaultdict(int)
for x in xs: d[x] += 1
That's clean, efficient, Pythonic, and much easier for most people to understand than anything involving reduce
.
It's kinda cheaty to import modules for oneliners, so here's a oneliner that is O(n) and works at least as far back as Python2.4
>>> f=lambda s,d={}:([d.__setitem__(i,d.get(i,0)+1) for i in s],d)[-1]
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}
And if you think __
methods are hacky, you can always do this
>>> f=lambda s,d=lambda:0:vars(([setattr(d,i,getattr(d,i,0)+1) for i in s],d)[-1])
>>> f("ABRACADABRA")
{'A': 5, 'R': 2, 'B': 2, 'C': 1, 'D': 1}
:)
$d{$_} += 1 for split //, 'abracadabra';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With