Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the year with the most number of people alive in Python

Given a list of people with their birth and end years (all between 1900 and 2000), find the year with the most number of people alive.

Here is my somewhat brute-force solution:

def most_populated(population, single=True):
    years = dict()
    for person in population:
        for year in xrange(person[0], person[1]):
            if year in years:
                years[year] += 1
            else:
                years[year] = 0
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]

print most_populated([(1920, 1939), (1911, 1944),
                      (1920, 1955), (1938, 1939)])
print most_populated([(1920, 1939), (1911, 1944),
                      (1920, 1955), (1938, 1939), (1937, 1940)], False)

I'm trying to find a more efficient way to solve this problem in Python. Both - readability and efficiency counts. Moreover, for some reason my code won't print [1938, 1939] while it should.

Update

Input is a list of tuples, where first element of a tuple is a year when person was born, and second element of a tuple is the year of death.

Update 2

End year (2nd part of tuple) counts as well as a year of the person being alive (so If the person dies in Sept 1939 (we don't care about the month), he is actually alive in 1939, at least part of it). That should fix the 1939' missing in results.

Best solution?

While readability counts in favor of @joran-beasley, for bigger input most efficient algorithm was provided by @njzk2. Thanks @hannes-ovrén for providing analysis in IPython notebook on Gist

like image 887
oski86 Avatar asked Jul 20 '15 17:07

oski86


4 Answers

I would go like this:

  • Sort persons by birth year (unborn list)
  • Starting from the first born
    • Put that person in the alive list
    • Using an insertion sort by date of death (the list stays sorted, so use a binary search)
    • Until you reach a person that was not born that year
  • Then, starting from the person in the alive list that dies first, remove it from the list.
  • Put the size of the alive list in a dict
  • Increment the year
  • Loop until the unborn and alive lists are empty

Complexity should be around O((m + n) * log(m)) (each year is considered only once, and each person only twice, multiplied by the insertion cost in the alive list)

Implementation

from bisect import insort

def most_populated(population, single=True):
    years = dict()
    unborn = sorted(population, key=lambda x: -x[0])
    alive = []
    dead = []
    for year in range(unborn[-1][0], max(population, key=lambda x: x[1])[1] + 1):
        while unborn and unborn[-1][0] == year:
            insort(alive, -unborn.pop()[1])
        while alive and alive[-1] == -(year - 1):
            dead.append(-alive.pop())
        years[year] = len(alive)
    return max(years, key=years.get) if single else \
           [key for key, val in years.iteritems() if val == max(years.values())]
like image 186
njzk2 Avatar answered Nov 10 '22 22:11

njzk2


>>> from collections import Counter
>>> from itertools import chain
>>> def most_pop(pop):
...     pop_flat = chain.from_iterable(range(i,j+1) for i,j in pop)
...     return Counter(pop_flat).most_common()
...
>>> most_pop([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)])[0]
like image 26
Joran Beasley Avatar answered Nov 10 '22 22:11

Joran Beasley


We can also use numpy slicing, which is quite neat, and should also be quite efficient:

import numpy as np
from collections import namedtuple

Person = namedtuple('Person', ('birth', 'death'))
people = [Person(1900,2000), Person(1950,1960), Person(1955, 1959)]

START_YEAR = 1900
END_YEAR = 2000
people_alive = np.zeros(END_YEAR - START_YEAR + 1) # Alive each year

for p in people:
    a = p.birth - START_YEAR
    b = p.death - START_YEAR + 1 # include year of death
    people_alive[a:b] += 1

# Find indexes of maximum aliveness and convert to year
most_alive = np.flatnonzero(people_alive == people_alive.max()) + START_YEAR

EDIT It seems like the namedtuple adds a bit of overhead, so to speed up a bit more, remove the namedtuple and do for birth, death in people: instead.

like image 2
Hannes Ovrén Avatar answered Nov 10 '22 22:11

Hannes Ovrén


  1. Just put the birth and death years into a dict. If it is birth, increase the value by 1. or vice versa.
  2. Sort the dict by keys and iterate by reading the current number of the alive people.
  3. Follow the 'maxAlive' an 'theYear' to get the first year with the highest number

    years = {} 
    
    for p in people:
        if p.birth in years:
            years[p.birth] += 1
        else:
            years[p.birth] = 1
    
        if p.death in years:
            years[p.death] -= 1
        else:
            years[p.death] = -1
    
    alive = 0
    maxAlive = 0
    theYear = people[0].birth
    for year in sorted(years):
        alive += years[year]
        if alive > maxAlive:
            maxAlive = alive
            theYear = year
    
like image 1
H Murat YILDIRIM Avatar answered Nov 10 '22 21:11

H Murat YILDIRIM