I have a use case where I need to classify people's trajectories within a big room.
In terms of performance and best Neo4j practices, what option would be better if I want to classify this data to later be able to search/fetch using any type combination of these classifications?
The different classifications are:
A trajectory contains a set of points (time, x, y, motion_type) that basically tells where the person went. A point tells you the exact location of the person in the room at a given time and if it was dwelling, walking or running (this is the motion type).
For example get all trajectories that were FEMALE, CUSTOMER with age between 21 and 30
Option 1:
// Here I get the set of trajectories that fall within a certain time range (:Trajectory(at) is indexed)
MATCH (trajectory:Trajectory)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
// Once I have all the trajectories I start filtering by the different criteria
MATCH (trajectory)-[:GENDER]->(:Female)
MATCH (trajectory)-[:PERSON_TYPE]->(:Customer)
// AgeGroup could have quite a lot of groups depending on how accurate the data is. At this stage we have 8 groups.
// Knowing that we have 8 groups, should I filter by property or should I have 8 different labels, one per age group? Is there any other option?
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"
RETURN COUNT(trajectory)
Option 2:
Trajectory node will have as many sub labels as categories available. For example, if I want to have the same result as in Option 1 I will do something like:
MATCH (trajectory:Trajectory:Female:Customer)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"
RETURN COUNT(trajectory)
// Or assuming I have a label per each age group:
MATCH (trajectory:Trajectory:Female:Customer:Age21-30)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(trajectory)
So I want to know:
As a note, not every trajectory will have every category. For example, if our facial recognition system cannot detect whether the person is female or male, that category won’t exist for that particular trajectory.
Labels are used when defining constraints and adding indexes for properties (see Schema). An example would be a label named User that you label all your nodes representing users with. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.
A relationship connects two nodes — a start node and an end node. Just like nodes, relationships can have properties. Relationships between nodes are a key part of a graph database.
The Neo4j property graph database model consists of: Nodes describe entities (discrete objects) of a domain. Nodes can have zero or more labels to define (classify) what kind of nodes they are. Relationships describes a connection between a source node and a target node.
If you look closely, the labels column is in the form of an array, which means that a single node can have multiple labels. Also, with cypher projection, we can provide a virtual label, as shown in the query. One feature of the Cypher Projection is the validateRelationships parameter.
When you follow the concepts in https://neo4j.com/docs/getting-started/current/graphdb-concepts/ you have two base types of nodes Persons and Locations Person nodes can have multiple labels
Location nodes have properties for the location but not one for the time.
The Trajectory I would model as a relation between a Person and a Location with an property for time with the type of movement. So you could have relations of the type DWELLING, WALKING and RUNNING between the nodes Person and Location
In your queries
MATCH (n)-[r]
WHERE n:Female and n:Customer and n:Age_20_30
AND datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)
Count Running Customers would be
MATCH (n)-[r:RUNNING]
WHERE n:Female and n:Customer and n:Age_20_30
AND datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With