Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The best way to store graph into the memory

The problem is that I have 150 000+ nodes with 200 000+ (may vary up to 1 000 000 or even more) all of them are written to a DB. Now I'd like to create a normal graph which will open access to routing. So, I need to compose it using data from existing DB. The idea is to build this huge graph, divide it into small pieces and write to DB BLOBS for storing. I tried to build it recursively but it seems to me that stack could't store so much data and all the time my algorithm breaks with allocation error. So, now I'm a bit confused with a way which will allow me to build this graph. I'm thinking about some kind of iterative method, but the main problem is architecture, I mean structures which I'm going to use for storing nodes and arcs. As I see this solution it should be smith like that:

struct Graph
{
    unsigned int nodesAmount;
    unsigned int arcsAmount;
    vector<Node*> NodeArr; //Some kind of container to store all existing Nodes
}

struct Node 
{
    unsigned int id;
    int dimension; //how many arcs use this node
    vector<Arcs*> ArcArr;
}

struct Arcs 
{
    unsigned int id;
    double cost;
    Node* Node_from;
    Node* Node_to;
}

I read lots of articles about method of storing graphs, but didn't find really good solution for such huge graphs. I would be very pleased for any ideas. Thank you

like image 463
tema Avatar asked Feb 15 '23 13:02

tema


2 Answers

You are on the right path.

Some small changes that I would suggest:

struct Graph
{
    unsigned int nodesAmount;
    unsigned int arcsAmount;
    vector<Node> NodeArr; // Store the nodes directly, not pointers
}

struct Node 
{
    unsigned int id;
    int dimension; //how many arcs use this node
    vector<int> Neighbours; // store neighbour IDs, saves memory
}

Since you are moving between database and C I would strongly suggest not to use pointers because those do not translate. Use IDs and look up your nodes by ID. If you need to store the edges separately then also do this by ID, not by pointer.

like image 160
Sergey L. Avatar answered Feb 17 '23 11:02

Sergey L.


I know that this solution has nothing to do with your snippet, but i'd like to show you another way.

The option that's used quite often is to have two arrays - one for the edges, one for the vertices.

The vertices array points to the edges array and says where the adjacent vertices start. The edges array stores the adjacent vertices itself.

For instance :

V = 6, E = 7

vertices = [0, 1, 1, 2, 5, 6]
edges = [1, 2, 3, 4, 5, 6, 0]

Considering the indexes, the edges array would look like :

| [1] | [] | [2] | [3, 4, 5] | [6] | [0] |

So the first vertex has a single adjacent vertex (with id 1), the fifth vertex has 3 adjacent vertices with IDs 3, 4, 5 etc.

like image 26
Danstahr Avatar answered Feb 17 '23 11:02

Danstahr