Given a list of people with their birth and end years (all between 1900
and 2000
), find the year with the most number of people alive.
Here is my somewhat brute-force solution:
def most_populated(population, single=True):
years = dict()
for person in population:
for year in xrange(person[0], person[1]):
if year in years:
years[year] += 1
else:
years[year] = 0
return max(years, key=years.get) if single else \
[key for key, val in years.iteritems() if val == max(years.values())]
print most_populated([(1920, 1939), (1911, 1944),
(1920, 1955), (1938, 1939)])
print most_populated([(1920, 1939), (1911, 1944),
(1920, 1955), (1938, 1939), (1937, 1940)], False)
I'm trying to find a more efficient way to solve this problem in Python
. Both - readability
and efficiency
counts. Moreover, for some reason my code won't print [1938, 1939]
while it should.
Update
Input is a list
of tuples, where first element of a tuple is a year
when person was born, and second element of a tuple
is the year of death.
Update 2
End year (2nd part of tuple) counts as well as a year of the person being alive (so If the person dies in Sept 1939
(we don't care about the month), he is actually alive in 1939, at least part of it). That should fix the 1939' missing in results.
Best solution?
While readability counts in favor of @joran-beasley, for bigger input most efficient algorithm was provided by @njzk2. Thanks @hannes-ovrén for providing analysis in IPython notebook on Gist
I would go like this:
unborn
list)alive
listalive
list that dies first, remove it from the list.alive
list in a dictunborn
and alive
lists are emptyComplexity should be around O((m + n) * log(m))
(each year is considered only once, and each person only twice, multiplied by the insertion cost in the alive
list)
from bisect import insort
def most_populated(population, single=True):
years = dict()
unborn = sorted(population, key=lambda x: -x[0])
alive = []
dead = []
for year in range(unborn[-1][0], max(population, key=lambda x: x[1])[1] + 1):
while unborn and unborn[-1][0] == year:
insort(alive, -unborn.pop()[1])
while alive and alive[-1] == -(year - 1):
dead.append(-alive.pop())
years[year] = len(alive)
return max(years, key=years.get) if single else \
[key for key, val in years.iteritems() if val == max(years.values())]
>>> from collections import Counter
>>> from itertools import chain
>>> def most_pop(pop):
... pop_flat = chain.from_iterable(range(i,j+1) for i,j in pop)
... return Counter(pop_flat).most_common()
...
>>> most_pop([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)])[0]
We can also use numpy slicing, which is quite neat, and should also be quite efficient:
import numpy as np
from collections import namedtuple
Person = namedtuple('Person', ('birth', 'death'))
people = [Person(1900,2000), Person(1950,1960), Person(1955, 1959)]
START_YEAR = 1900
END_YEAR = 2000
people_alive = np.zeros(END_YEAR - START_YEAR + 1) # Alive each year
for p in people:
a = p.birth - START_YEAR
b = p.death - START_YEAR + 1 # include year of death
people_alive[a:b] += 1
# Find indexes of maximum aliveness and convert to year
most_alive = np.flatnonzero(people_alive == people_alive.max()) + START_YEAR
EDIT It seems like the namedtuple adds a bit of overhead, so to speed up a bit more, remove the namedtuple and do
for birth, death in people:
instead.
Follow the 'maxAlive' an 'theYear' to get the first year with the highest number
years = {}
for p in people:
if p.birth in years:
years[p.birth] += 1
else:
years[p.birth] = 1
if p.death in years:
years[p.death] -= 1
else:
years[p.death] = -1
alive = 0
maxAlive = 0
theYear = people[0].birth
for year in sorted(years):
alive += years[year]
if alive > maxAlive:
maxAlive = alive
theYear = year
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With