Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast algorithm for finding all people alive at a given time?

Tags:

java

algorithm

Assume you have something like

class Person {
  LocalDate bornOn;
  LocalDate diedOn;
}

let's say you have a bunch of "Person" instances that you can store any way you like.

What's the best way of writing an efficient function that can list all people that were alive at a given time?

The data structure should be efficiently mutable as well, in particular in terms of adding new elements.

I.e. conceptually something like

List<Person> alive(List<Person> people, LocalDate date) {
  return people.stream().filter(x -> x.bornOn.compareTo(date) <= 0 && x.diedOn.compareTo(date) > 0).collect(Collectors.toList())
}

Only more efficient.

My first gut feeling would be having two NavigableMaps

NavigableMap<LocalDate, Person> peopleSortedByBornOn;
NavigableMap<LocalDate, Person> peopleSortedByDiedOn;

each could be queried with headMap() / tailMap() of the given date, and the intersection of those queries would be the result.

Is there any faster, or significantly more convenient solution though? Maybe even some sort of widely used Java collection/map type that'd support such an operation?

like image 679
Bogey Avatar asked Jan 26 '23 17:01

Bogey


2 Answers

I would like to mention geometric data structures, like quad trees. For theoretical purposes. Have (born, died) coordinates: died >= born.

    d         b=d
    |    | - /
    | +  |  /
    |    | /
  D |____|/
    |   /:
    |- / :
    | /  :
    |/___:_____ b
         D

The points are all located in the upper triangle, and + is the rectangular area for people living at date D. The rectangle is open ended left and top.

Having geometric data structure could do. And there are databases that can handle such geometric queries.

I would love to see an implementation, though a speed advantage I would not bet on. Maybe with huge numbers.

like image 113
Joop Eggen Avatar answered Jan 30 '23 03:01

Joop Eggen


Given the constraints clarifications, I would keep it simple and use a map to keep references for living people on a given day, effectively creating an index.

Map<LocalDate,LinkedList<Person>> aliveMap;

Puts cost would be O(1) for the map and O(1) for the LinkedList. Get on the other hand, is as good as it gets; O(1) (assuming a good hashing algorithm).

Memory wise, you would incur the cost of the extra "references", however this might be significant (~80 years x 365 x 8 bytes for a x64 VM or 233,600 bytes per person).

This approach will yield optimal performance on the get operations, probably the worst in terms of memory and average on the put operations.

Variation: Instead of creating the full index, you can create buckets e.g. yearly where you first get everyone alive in a given year and then filter out the dead.

Map<Integer,LinkedList<Person>> aliveMap;

NB: I assume that your data go over 100's of years and not cover the whole population (7.5 billion). If you were only looking into a 50-100 year window, then there may be more efficient specialisations.

like image 29
Ioannis Deligiannis Avatar answered Jan 30 '23 04:01

Ioannis Deligiannis