Find the year with the most number of people alive in Python

Question

Given a list of people with their birth and end years (all between 1900 and 2000), find the year with the most number of people alive.

Here is my somewhat brute-force solution:

def most_populated(population, single=True):
    years = dict()
    for person in population:
        for year in xrange(person[0], person[1]):
            if year in years:
                years[year] += 1
            else:
                years[year] = 0
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]

print most_populated([(1920, 1939), (1911, 1944),
                      (1920, 1955), (1938, 1939)])
print most_populated([(1920, 1939), (1911, 1944),
                      (1920, 1955), (1938, 1939), (1937, 1940)], False)

I'm trying to find a more efficient way to solve this problem in Python. Both - readability and efficiency counts. Moreover, for some reason my code won't print [1938, 1939] while it should.

Update

Input is a list of tuples, where first element of a tuple is a year when person was born, and second element of a tuple is the year of death.

Update 2

End year (2nd part of tuple) counts as well as a year of the person being alive (so If the person dies in Sept 1939 (we don't care about the month), he is actually alive in 1939, at least part of it). That should fix the 1939' missing in results.

Best solution?

While readability counts in favor of @joran-beasley, for bigger input most efficient algorithm was provided by @njzk2. Thanks @hannes-ovrén for providing analysis in IPython notebook on Gist

njzk2 · Accepted Answer

I would go like this:

Sort persons by birth year (unborn list)
Starting from the first born
- Put that person in the alive list
- Using an insertion sort by date of death (the list stays sorted, so use a binary search)
- Until you reach a person that was not born that year
Then, starting from the person in the alive list that dies first, remove it from the list.
Put the size of the alive list in a dict
Increment the year
Loop until the unborn and alive lists are empty

Complexity should be around O((m + n) * log(m)) (each year is considered only once, and each person only twice, multiplied by the insertion cost in the alive list)

Implementation

from bisect import insort

def most_populated(population, single=True):
    years = dict()
    unborn = sorted(population, key=lambda x: -x[0])
    alive = []
    dead = []
    for year in range(unborn[-1][0], max(population, key=lambda x: x[1])[1] + 1):
        while unborn and unborn[-1][0] == year:
            insort(alive, -unborn.pop()[1])
        while alive and alive[-1] == -(year - 1):
            dead.append(-alive.pop())
        years[year] = len(alive)
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]

Joran Beasley · Answer

>>> from collections import Counter
>>> from itertools import chain
>>> def most_pop(pop):
...     pop_flat = chain.from_iterable(range(i,j+1) for i,j in pop)
...     return Counter(pop_flat).most_common()
...
>>> most_pop([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)])[0]

Hannes Ovrén · Answer

We can also use numpy slicing, which is quite neat, and should also be quite efficient:

import numpy as np
from collections import namedtuple

Person = namedtuple('Person', ('birth', 'death'))
people = [Person(1900,2000), Person(1950,1960), Person(1955, 1959)]

START_YEAR = 1900
END_YEAR = 2000
people_alive = np.zeros(END_YEAR - START_YEAR + 1) # Alive each year

for p in people:
    a = p.birth - START_YEAR
    b = p.death - START_YEAR + 1 # include year of death
    people_alive[a:b] += 1

# Find indexes of maximum aliveness and convert to year
most_alive = np.flatnonzero(people_alive == people_alive.max()) + START_YEAR

EDIT It seems like the namedtuple adds a bit of overhead, so to speed up a bit more, remove the namedtuple and do for birth, death in people: instead.

H Murat YILDIRIM · Answer

Just put the birth and death years into a dict. If it is birth, increase the value by 1. or vice versa.
Sort the dict by keys and iterate by reading the current number of the alive people.

Follow the 'maxAlive' an 'theYear' to get the first year with the highest number

years = {} 

for p in people:
    if p.birth in years:
        years[p.birth] += 1
    else:
        years[p.birth] = 1

    if p.death in years:
        years[p.death] -= 1
    else:
        years[p.death] = -1

alive = 0
maxAlive = 0
theYear = people[0].birth
for year in sorted(years):
    alive += years[year]
    if alive > maxAlive:
        maxAlive = alive
        theYear = year

Find the year with the most number of people alive in Python

Tags:

python

algorithm

oski86

4 Answers

Implementation

njzk2

Joran Beasley

Hannes Ovrén

H Murat YILDIRIM

Recent Activity

Donate For Us

Find the year with the most number of people alive in Python

Tags:

python

algorithm

oski86

4 Answers

Implementation

njzk2

Joran Beasley

Hannes Ovrén

H Murat YILDIRIM

Related questions

Recent Activity

Donate For Us