Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating Family Tree with Neo4J

Tags:

json

neo4j

I have a set of data for a family tree in Neo4J and am trying to build a Cypher query that produces a JSON data set similar to the following:

{Name:  "Bob",
      parents: [
          {Name:  "Roger",
             parents: [
                Name: "Robert",
                Name: "Jessica"
             ]},
          {Name:  "Susan",
             parents: [
                Name: "George",
                Name: "Susan"
             ]}
      ]}

My graph has a relationship of PARENT between MEMBER nodes (i.e. MATCH (p.Member)-[:PARENT]->(c.Member) ). I found Nested has_many relationships in cypher and neo4j cypher nested collect which ends up grouping all parents together for the main child node I am searching for.

Adding some clarity based on feedback:

Every member has a unique identifier. The unions are currently all associated with the PARENT relationship. Everything is indexed so that performance will not suffer. When I run a query to just get back the node graph I get the results I expect. I'm trying to return an output that I can use for visualization purposes with D3. Ideally this will be done with a Cypher query as I'm using the API to access neo4j from the frontend being built.

Adding a sample query:

MATCH (p:Person)-[:PARENT*1..5]->(c:Person)
WHERE c.FirstName = 'Bob'
RETURN p.FirstName, c.FirstName

This query returns a list of each parent for five generations, but instead of showing the hierarchy, it's listing 'Bob' as the child for each relationship. Is there a Cypher query that would show each relationship in the data at least? I can format it as I need to from there...

like image 621
OpenDataAlex Avatar asked Jan 07 '15 20:01

OpenDataAlex


4 Answers

Genealogical data might comply with the GEDCOM standard and include two types of nodes: Person and Union. The Person node has its identifier and the usual demographic facts. The Union nodes have a union_id and the facts about the union. In GEDCOM, Family is a third element bringing these two together. But in Neo4j, I found it suitable to also include the union_id in Person nodes. I used 5 relationships: father, mother, husband, wife and child. The family is then two parents with an inward vector and each child with an outward vector. The image illustrates this. This is very handy for visualizing connections and generating hypotheses. For example, consider the attached picture and my ancestor Edward G Campbell, the product of union 1917 where three brothers married three Vaught sisters from union 8944 and two married Gaither sisters from union 2945. Also, in the upper left, how Mahala Campbell married her step-brother John Greer Armstrong. Next to Mahala is an Elizabeth Campbell who is connected by marriage to other Campbell, but is likely directly related to them. Similarly, you can hypothesize about Rachael Jacobs in the upper right and how she might relate to the other Jacobs. Notice the query.  From the few initial nodes visualized, you can click to open others. I use bulk inserts which can populate ~30000 Person nodes and ~100,000 relationships in just over a minute. I have a small .NET function that returns the JSon from a dataview; this generic solution works with any dataview so it is scalable. I'm now working on adding other data, such as locations (lat/long), documentation (particularly that linking folks, such as a census), etc.

like image 160
David A Stumpf Avatar answered Oct 19 '22 22:10

David A Stumpf


You might also have a look at Rik van Bruggens Blog on his family data:

Regarding your query

You already create a path pattern here: (p:Person)-[:PARENT*1..5]->(c:Person) you can assign it to a variable tree and then operate on that variable, e.g. returning the tree, or nodes(tree) or rels(tree) or operate on that collection in other ways:

MATCH tree = (p:Person)-[:PARENT*1..5]->(c:Person)
WHERE c.FirstName = 'Bob'
RETURN nodes(tree), rels(tree), tree, length(tree),
       [n in nodes(tree) | n.FirstName] as names

See also the cypher reference card: http://neo4j.com/docs/stable/cypher-refcard and the online training http://neo4j.com/online-training to learn more about Cypher.

Don't forget to

create index on :Person(FirstName);
like image 30
Michael Hunger Avatar answered Oct 19 '22 23:10

Michael Hunger


I'd suggest building a method to flatten out your data into an array. If they objects don't have UUIDs you would probably want to give them IDs as you flatten and then have a parent_id key for each record.

You can then run it as a set of cypher queries (either making multiple requests to the query REST API, or using the batch REST API) or alternatively dump the data to CSV and use cypher's LOAD CSV command to load the objects.

An example cypher command with params would be:

CREATE (:Member {uuid: {uuid}, name: {name}}

And then running through the list again with the parent and child IDs:

MATCH (m1:Member {uuid: {uuid1}}), (m2:Member {uuid: {uuid2}})
CREATE m1<-[:PARENT]-m2

Make sure to have an index on the ID for members!

like image 3
Brian Underwood Avatar answered Oct 19 '22 23:10

Brian Underwood


The only way I have found thus far to get the data I am looking for is to actually return the relationship information, like so:

MATCH ft = (person {firstName: 'Bob'})<-[:PARENT]-(p:Person)
RETURN EXTRACT(n in nodes(ft) | {firstName: n.firstName}) as parentage
ORDER BY length(ft);

Which will return a dataset I am then able to morph:

["Bob", "Roger"]
["Bob", "Susan"]
["Bob", "Roger", "Robert"]
["Bob", "Susan", "George"]
["Bob", "Roger", "Jessica"]
["Bob", "Susan", "Susan"]
like image 2
OpenDataAlex Avatar answered Oct 19 '22 22:10

OpenDataAlex