Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Family Tree Algorithm

I'm working on putting together a problem set for an intro-level CS course and came up with a question that, on the surface, seems very simple:

You are given a list of people with the names of their parents, their birth dates, and their death dates. You are interested in finding out who, at some point in their lifetime, was a parent, a grandparent, a great-grandparent, etc. Devise an algorithm to label each person with this information as an integer (0 means the person never had a child, 1 means that the person was a parent, 2 means that the person was a grandparent, etc.)

For simplicity, you can assume that the family graph is a DAG whose undirected version is a tree.

The interesting challenge here is that you can't just look at the shape of the tree to determine this information. For example, I have 8 great-great-grandparents, but since none of them were alive when I was born, in their lifetimes none of them were great-great-grandparents.

The best algorithm I can come up with for this problem runs in time O(n2), where n is the number of people. The idea is simple - start a DFS from each person, finding the furthest descendant down in the family tree that was born before that person's death date. However, I'm pretty sure that this is not the optimal solution to the problem. For example, if the graph is just two parents and their n children, then the problem can be solved trivially in O(n). What I'm hoping for is some algorithm that is either beats O(n2) or whose runtime is parameterized over the shape of the graph that makes it fast for wide graphs with a graceful degradation to O(n2) in the worst-case.

like image 654
templatetypedef Avatar asked May 23 '11 19:05

templatetypedef


People also ask

How is a family tree structured?

A family tree is a visual representation of a person's lineage, tracing relationships to common ancestors. Visually similar to an org chart, this diagram is usually presented in a tree structure starting with one individual as the root. From the root, lines representing branches terminate in boxes representing leaves.

Which data structure is used for family tree?

A general graph structure would be the best (a tree being a specific form of a graph). The edge would carry the relationship. Then you can run a path finding algorithm (like good old dijkstra) only on edges which represent blood relationship.

How do you make a family tree step by step?

To draw a family tree, research your family history by asking family members for information about your relatives. Then, draw a tree or diagram on a large piece of paper. Next, write your name on one of the limbs and add your parents and siblings to the limbs closest to you.

Who is the first generation in a family tree?

Your grandparents and their siblings make up a third. The top level of the family tree is the first generation, followed by their children (second generation) and so on, assigning each successive generation a higher number - third, fourth, fifth.


1 Answers

Update: This is not the best solution I have come up with, but I've left it because there are so many comments relating to it.

You have a set of events (birth/death), parental state (no descendants, parent, grandparent, etc) and life state (alive, dead).

I would store my data in structures with the following fields:

mother father generations is_alive may_have_living_ancestor 

Sort your events by date, and then for each event take one of the following two courses of logic:

Birth:     Create new person with a mother, father, 0 generations, who is alive and may         have a living ancestor.     For each parent:         If generations increased, then recursively increase generations for             all living ancestors whose generations increased.  While doing that,             set the may_have_living_ancestor flag to false for anyone for whom it is             discovered that they have no living ancestors.  (You only iterate into             a person's ancestors if you increased their generations, and if they             still could have living ancestors.)  Death:     Emit the person's name and generations.     Set their is_alive flag to false. 

The worst case is O(n*n) if everyone has a lot of living ancestors. However in general you've got the sorting preprocessing step which is O(n log(n)) and then you're O(n * avg no of living ancestors) which means that the total time tends to be O(n log(n)) in most populations. (I hadn't counted the sorting prestep properly, thanks to @Alexey Kukanov for the correction.)

like image 197
btilly Avatar answered Sep 22 '22 17:09

btilly