Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying by node labels vs relationships

I have a use case where I need to classify people's trajectories within a big room.

In terms of performance and best Neo4j practices, what option would be better if I want to classify this data to later be able to search/fetch using any type combination of these classifications?

The different classifications are:

  1. Gender (FEMALE, MALE)
  2. PersonType (STAFF, CUSTOMER)
  3. AgeGroup (11-20, 21-30, 31-40, etc)

A trajectory contains a set of points (time, x, y, motion_type) that basically tells where the person went. A point tells you the exact location of the person in the room at a given time and if it was dwelling, walking or running (this is the motion type).

For example get all trajectories that were FEMALE, CUSTOMER with age between 21 and 30

Option 1:

// Here I get the set of trajectories that fall within a certain time range (:Trajectory(at) is indexed)
MATCH (trajectory:Trajectory)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")

// Once I have all the trajectories I start filtering by the different criteria
MATCH (trajectory)-[:GENDER]->(:Female)
MATCH (trajectory)-[:PERSON_TYPE]->(:Customer)

// AgeGroup could have quite a lot of groups depending on how accurate the data is. At this stage we have 8 groups.
// Knowing that we have 8 groups, should I filter by property or should I have 8 different labels, one per age group? Is there any other option?
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"

RETURN COUNT(trajectory)

Option 2:

Trajectory node will have as many sub labels as categories available. For example, if I want to have the same result as in Option 1 I will do something like:

MATCH (trajectory:Trajectory:Female:Customer)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"

RETURN COUNT(trajectory)

// Or assuming I have a label per each age group:
MATCH (trajectory:Trajectory:Female:Customer:Age21-30)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")

RETURN COUNT(trajectory)

So I want to know:

  1. How to handle the age groups, if as individual properties, as different labels or if there is a better option.
  2. How to handle the different categories of a trajectory: as labels or as relationships pointing to a label with the information.

As a note, not every trajectory will have every category. For example, if our facial recognition system cannot detect whether the person is female or male, that category won’t exist for that particular trajectory.

like image 607
Lionel Cichero Avatar asked Jan 29 '20 01:01

Lionel Cichero


People also ask

Why are labels used in a Neo4j database Choose all that apply?

Labels are used when defining constraints and adding indexes for properties (see Schema). An example would be a label named User that you label all your nodes representing users with. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

How many nodes can a single relationship connect?

A relationship connects two nodes — a start node and an end node. Just like nodes, relationships can have properties. Relationships between nodes are a key part of a graph database.

What are relationships in Neo4j?

The Neo4j property graph database model consists of: Nodes describe entities (discrete objects) of a domain. Nodes can have zero or more labels to define (classify) what kind of nodes they are. Relationships describes a connection between a source node and a target node.

Can a node have multiple labels Neo4j?

If you look closely, the labels column is in the form of an array, which means that a single node can have multiple labels. Also, with cypher projection, we can provide a virtual label, as shown in the query. One feature of the Cypher Projection is the validateRelationships parameter.


1 Answers

When you follow the concepts in https://neo4j.com/docs/getting-started/current/graphdb-concepts/ you have two base types of nodes Persons and Locations Person nodes can have multiple labels

  1. Male or Female for gender
  2. Staff or Customer for type of Person
  3. Age specific labels Age_20_30

Location nodes have properties for the location but not one for the time.

The Trajectory I would model as a relation between a Person and a Location with an property for time with the type of movement. So you could have relations of the type DWELLING, WALKING and RUNNING between the nodes Person and Location

In your queries

MATCH (n)-[r]
WHERE n:Female and n:Customer and n:Age_20_30
AND   datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)

Count Running Customers would be

MATCH (n)-[r:RUNNING]
WHERE n:Female and n:Customer and n:Age_20_30
AND   datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)
like image 71
FredvN Avatar answered Oct 19 '22 12:10

FredvN