Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good data structure for representing multigraph (C++)

What is the best data structure to describe an unoriented multigraph (optimized for speed and memory)?

A list of edges would be inappropriate since getting the neighbors of a vertex happens often in my code.

An adjacency list isn't good because I have to keep information about visited edges, and when an edge from 1 to 3 is visited (say I am traversing through the neighbors of 1 and found an edge that leads to three and has weight w), I have to find the same edge in the list of neighbors of 3 to mark it as visited, which is slow.

I've thought about adjacency matrix when each cell would be set<Edge> where Edge is a structure representing info about if the vertex is visited, weight of the edge, etc. However, when having graph[0][1][i] as visited I can't set the same edge in graph[1][0]'s edges without linear searching.

Are there any good approaches and techniques when representing multigraphs? I don't want third library solutions like boost::AdjacencyList; I have to write it on my own.

Edit:Sorry for the misunderstanding. It's an exercise for the University and I can use only Standard library to do it. The graph has constraints: 0 < n ≤ 300 - number of vertices 0 < m ≤ 20000 - number of edges 1 ≤ w ≤ 500

I have memory limit of 32 MB and time limit 0.5s (I have to traverse in with DFS).

like image 419
GeneralFailure Avatar asked Dec 24 '12 14:12

GeneralFailure


1 Answers

A somewhat complex representation that however provides efficient local operations is the following

struct Link;

struct Node
{
    Link *firstIn, *lastIn, *firstOut, *lastOut;
    ... node data ...
};

struct Link
{
    Node *from, *to;
    Link *prevInFrom, *nextInFrom, *prevInTo, *nextInTo;
    ... link data ...
};

Basically for each Node there are two double-linked lists, one for incoming links and one for outgoing links. Each Link knows the starting and ending Node and also has the prev and next pointers for the two lists that contain it (the outgoing list in the "from" node and the incoming list in the "to" node).

Using this approach you get O(1) node and link creation and destruction, O(inbound_deg(node)) for finding which links are arriving to a node, O(outbound_deg(node)) for finding which links are leaving a node. The structure also supports multiple connections between the same pair of nodes and also multiple loops.

The space required is a fixed amount per node and per link but the overhead can be ok or not depending on the application (4 pointers per node and 6 pointers per link). Using simple lists instead of double-linked lists the overhead becomes 2 pointers per node and 4 per link, but link deletion becomes O(outbound_deg(from) + inbound_deg(to)) and is not constant any more.

Note also that the shown structure is not cache friendly and in modern desktop computers may be a more "brute-force" approach like e.g. vectors of pointers instead of a doubly-linked list can offer better speed depending on how big the list is going to be and how often you're mutating the graph structure.

It may even make sense to split the link object to embed the forward link data for example in the "from" node, keeping back-pointers in the "to" node.

struct Node
{
    struct OutgoingLink
    {
        Node *to;
        int incoming_index;
        ... link data ...
    };

    struct IncomingLink
    {
        Node *from;
        int outgoing_index;
    };

    std::vector<OutgoingLink> outgoing_links;
    std::vector<IncomingLink> incoming_links;

    ... node data ...
};

If you're most of the time traversing links in the forward direction and if links are not added to existing nodes, then even better would be to just use one memory chunk for a node and the outgoing link data, but this unfortunately is not easily supported by C++.

In C it would be

typedef struct TOutgoingLink
{
    struct TNode *to;
    int incoming_index;
    ... link data ...
} OutgoingLink;

typedef struct TIncomingLink
{
    struct TNode *from;
    int outgoing_index;
} IncomingLink;

typedef struct TNode
{
    ... node data ...
    int num_incoming_links;
    int num_outgoing_links;
    IncomingLink *incoming_links;   // separate array
    OutgoingLink outgoing_links[1]; // embedded array starting here
} Node;

using malloc(sizeof(Node) + (num_outgoing_links-1)*sizeof(OutgoingLink)) to allocate the space for a node.

With this approach all data for a node and its outgoing links is going to be in adjacent memory locations.

like image 92
6502 Avatar answered Oct 19 '22 19:10

6502