Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j Design: Property vs "Node & Relationship"

I have a node type that has a string property that will have the same value really often. Etc. Millions of nodes with only 5 options of that string value. I will be doing searches by that property.

My question would be what is better in terms of performance and memory: a) Implement it as a node property and have lots of duplicates (and search using WHERE). b) Implement it as 5 additional nodes, where all original nodes reference one of them (and search using additional MATCH).

like image 682
Martynas Avatar asked Mar 18 '13 09:03

Martynas


3 Answers

Without knowing further details it's hard to give a general purpose answer.

From a performance perspective it's better to limit the search as early as possible. Even more beneficial if you do not have to look into properties for a traversal.

Given that I assume it's better to move the lookup property into a seperate node and use the value as relationship type.

like image 188
Stefan Armbruster Avatar answered Oct 10 '22 12:10

Stefan Armbruster


Use labels; this blog post is a good intro to this new Neo4j 2.0 feature:

  • Labels and Schema Indexes in Neo4j
like image 44
Eduardo Pareja Tobes Avatar answered Oct 10 '22 11:10

Eduardo Pareja Tobes


I've thought about this problem a little as well. In my case, I had to represent state:

  • STARTED
  • IN_PROGRESS
  • SUBMITTED
  • COMPLETED

Overall the Node + Relationship approach looks more appealing in that only a single relationship reference needs to be maintained each time rather than a property string and you don't need to scan an extra additional index which has to be maintained on the property (memory and performance would intuitively be in favor of this approach).

Another advantage is that it easily supports the ability of a node being linked to multiple "special nodes". If you foresee a situation where this should be possible in your model, this is better than having to use a property array (and searching using "in").

In practice I found that the problem then became, how do you access these special nodes each time. Either you maintain some sort of constants reference where you have the node ID of these special nodes where you can jump right into them in your START statement (this is what we do) or you need to do a search against property of the special node each time (name, perhaps) and then traverse down it's relationships. This doesn't make for the prettiest of cypher queries.

like image 41
Matt Wielbut Avatar answered Oct 10 '22 10:10

Matt Wielbut