Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Modelling for Big Data

enter image description here

I have this type of database to implement Circled point are various cities.

People are travelling from one city to another. Number of people travelling from one city to another are shown by the weight over the edges.

Circle G : is my goal city

What I want to achieve?

  1. total number of people reaching "G"?
  2. What path they followed to reach goal "G"?

e.g :

  • 200 people started from A->F..!

  • 100 goes back to A using Path F->A

  • from remaining 100 only 20 user made to reach Goal "G"

so, the number of people reaching "G" from the right side is 80

What information I need at point “G”

  • 80 people from right side = 20(from A->F->G) + 60 (from A->D->F->G)

This is a small graph. I want to implement this on a graph having 1000+ Nodes?

Right now the approach I am taking to solve this is (using ArangoDB) :

  • I am creating One Vertex collection and One Edge collection.

  • Each City (A, B, C, D) is document inside same collection.

  • I am saving the complete previous path for every people travelling.

e.g John is travelling from A->G

  • The details I am saving at F for John: {"John : A_D_F"}

  • The details I am saving at city G for John: {"John : A_D_F_G"}

  • I am repeating this for Every single people travelling.

In short I want to achieve funneling at Any point(city) in the graph.

What is the better way of data modelling for this type of graph in Arango or other Big Data Storage and which Big Data Storage will be best?

Thanks..!

like image 432
Prakash P Avatar asked Dec 31 '25 15:12

Prakash P


1 Answers

You are right in your conclusion to treat this a graph problem. Irrespective of the tech stack that you want to use, I suggest you model your data by following some of the best practices/examples outlined in these links

https://neo4j.com/developer/guide-data-modeling/

https://www.infoq.com/articles/data-modeling-graph-databases

There are a lot of proven choices with reference to scaling to a 1000 or even 10000 node graph

Here is one possible way to model this:

a] Treat the Cities and Persons as Nodes

b] Then model the City-to-City path as Relationships

c] Also then add the Person-has-travelled-to-City as a Relationship

d] If you need to sequence the relationship you can use Properties on the Person-to-City relationship

Next step is to

  • Create these in a Graphdb of your choice
  • Create the Sample Dataset
  • Run your queries and check the answers
  • See if you need to optimize either the model or the Data
  • Hope this helps

    like image 94
    Satya Kaliki Avatar answered Jan 03 '26 13:01

    Satya Kaliki



    Donate For Us

    If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!