Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j: label vs. indexed property?

Tags:

Suppose you're Twitter, and:

  • You have (:User) and (:Tweet) nodes;
  • Tweets can get flagged; and
  • You want to query the list of flagged tweets currently awaiting moderation.

You can either add a label for those tweets, e.g. :AwaitingModeration, or add and index a property, e.g. isAwaitingModeration = true|false.

Is one option inherently better than the other?

I know the best answer is probably to try and load test both :), but is there anything from Neo4j's implementation POV that makes one option more robust or suited for this kind of query?

Does it depend on the volume of tweets in this state at any given moment? If it's in the 10s vs. the 1000s, does that make a difference?

My impression is that labels are better suited for a large volume of nodes, whereas indexed properties are better for smaller volumes (ideally, unique nodes), but I'm not sure if that's actually true.

Thanks!

like image 466
Aseem Kishore Avatar asked Jan 15 '15 03:01

Aseem Kishore


People also ask

What are labels in Neo4j?

Neo4j recently introduced the concept of labels and their sidekick, schema indexes. Labels are a way of attaching one or more simple types to nodes (and relationships), while schema indexes allow to automatically index labelled nodes by one or more of their properties.

Why are labels used in Neo4j database?

Labels are used when defining constraints and adding indexes for properties (see Schema). An example would be a label named User that you label all your nodes representing users with. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

What are indexes in Neo4j?

In neo4j you can create index for both property and nodes. Indexing is data structure that helps faster performance on retrieval operation on database. There is special features in neo4j indexing once you create indexing that index will manage itself and keep it up to date whenever changes made on the database.

How do you create an index on the Name property for nodes with the customer label?

Create a single-property b-tree index for nodes. A named b-tree index on a single property for all nodes with a particular label can be created with CREATE INDEX index_name FOR (n:Label) ON (n. property) . Note that the index is not immediately available, but is created in the background.


1 Answers

UPDATE: Follow up blog post published.

This is a common question when we model datasets for customers and a typical use case for Active/NonActive entities.

This is a little feedback about what I've experienced valid for Neo4j2.1.6 :

Point 1. You will not have difference in db accesses between matching on a label or on an indexed property and return the nodes

Point 2. The difference will be encountered when such nodes are at the end of a pattern, for example

MATCH (n:User {id:1}) WITH n MATCH (n)-[:WRITTEN]->(post:Post) WHERE post.published = true RETURN n, collect(post) as posts; 

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc' > WITH n > MATCH (n)-[:WRITTEN]->(post:BlogPost) > WHERE post.active = true > RETURN n, size(collect(post)) as posts; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n                                                                                                                                                         | posts | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"[email protected]"} | 1     | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row  ColumnFilter(0)   |   +Extract     |     +ColumnFilter(1)       |       +EagerAggregation         |         +Filter           |           +SimplePatternMatcher             |             +SchemaIndex  +----------------------+------+--------+----------------------+----------------------------------------------------------------------------+ |             Operator | Rows | DbHits |          Identifiers |                                                                      Other | +----------------------+------+--------+----------------------+----------------------------------------------------------------------------+ |      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts | |              Extract |    1 |      0 |                      |                                                                      posts | |      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 | |     EagerAggregation |    1 |      0 |                      |                                                                          n | |               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) == {  AUTOBOOL1}) | | SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            | |          SchemaIndex |    1 |      2 |                 n, n |                                                {  AUTOSTRING0}; :User(_id) | +----------------------+------+--------+----------------------+----------------------------------------------------------------------------+  Total database accesses: 17 

In this case, Cypher will not make use of the index :Post(published).

Thus the use of labels is more performant in the case you have a ActivePost label for e.g. :

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc' > WITH n > MATCH (n)-[:WRITTEN]->(post:ActivePost) > RETURN n, size(collect(post)) as posts; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n                                                                                                                                                         | posts | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"[email protected]"} | 1     | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row  ColumnFilter(0)   |   +Extract     |     +ColumnFilter(1)       |       +EagerAggregation         |         +Filter           |           +SimplePatternMatcher             |             +SchemaIndex  +----------------------+------+--------+----------------------+----------------------------------+ |             Operator | Rows | DbHits |          Identifiers |                            Other | +----------------------+------+--------+----------------------+----------------------------------+ |      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts | |              Extract |    1 |      0 |                      |                            posts | |      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 | |     EagerAggregation |    1 |      0 |                      |                                n | |               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) | | SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  | |          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) | +----------------------+------+--------+----------------------+----------------------------------+  Total database accesses: 7 

Point 3. Always use labels for positives, meaning for the case above, having a Draft label will force you to execute the following query :

MATCH (n:User {id:1}) WITH n MATCH (n)-[:POST]->(post:Post) WHERE NOT post :Draft RETURN n, collect(post) as posts; 

Meaning that Cypher will open each node label headers and do a filter on it.

Point 4. Avoid having the need to match on multiple labels

MATCH (n:User {id:1}) WITH n MATCH (n)-[:POST]->(post:Post:ActivePost) RETURN n, collect(post) as posts;  neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc' > WITH n > MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost) > RETURN n, size(collect(post)) as posts; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n                                                                                                                                                         | posts | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"[email protected]"} | 1     | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row  ColumnFilter(0)   |   +Extract     |     +ColumnFilter(1)       |       +EagerAggregation         |         +Filter           |           +SimplePatternMatcher             |             +SchemaIndex  +----------------------+------+--------+----------------------+---------------------------------------------------------------+ |             Operator | Rows | DbHits |          Identifiers |                                                         Other | +----------------------+------+--------+----------------------+---------------------------------------------------------------+ |      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts | |              Extract |    1 |      0 |                      |                                                         posts | |      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 | |     EagerAggregation |    1 |      0 |                      |                                                             n | |               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) | | SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               | |          SchemaIndex |    1 |      2 |                 n, n |                                   {  AUTOSTRING0}; :User(_id) | +----------------------+------+--------+----------------------+---------------------------------------------------------------+  Total database accesses: 12 

This will result in the same process for Cypher that on point 3.

Point 5. If possible, avoid the need to match on labels by having well typed named relationships

MATCH (n:User {id:1}) WITH n MATCH (n)-[:PUBLISHED]->(p) RETURN n, collect(p) as posts 

-

MATCH (n:User {id:1}) WITH n MATCH (n)-[:DRAFTED]->(post) RETURN n, collect(post) as posts;  neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc' > WITH n > MATCH (n)-[:DRAFTED]->(post) > RETURN n, size(collect(post)) as posts; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | n                                                                                                                                                         | posts | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Node[118]{_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"[email protected]"} | 3     | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row  ColumnFilter(0)   |   +Extract     |     +ColumnFilter(1)       |       +EagerAggregation         |         +SimplePatternMatcher           |           +SchemaIndex  +----------------------+------+--------+----------------------+----------------------------------+ |             Operator | Rows | DbHits |          Identifiers |                            Other | +----------------------+------+--------+----------------------+----------------------------------+ |      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts | |              Extract |    1 |      0 |                      |                            posts | |      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 | |     EagerAggregation |    1 |      0 |                      |                                n | | SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  | |          SchemaIndex |    1 |      2 |                 n, n |      {  AUTOSTRING0}; :User(_id) | +----------------------+------+--------+----------------------+----------------------------------+  Total database accesses: 2 

Will be more performant, because it will use all the power of the graph and just follow the relationships from the node resulting in no more db accesses than matching the user node and thus no filtering on labels.

This was my 0,02€

like image 142
Christophe Willemsen Avatar answered Oct 23 '22 12:10

Christophe Willemsen