I'm currently working on a directed graph data structure in C++ (no Boost GL for this project). The primary application will be identifying connected components and sinks. The graphs are expected to be sparse (E ~ 4V upper limit on num edges) and will all be uniform weight. I'm trying to decide between adjacency list, incidence list or possibly some other representation that I haven't heard of yet (adj. matrix not an option bc of sparsity). The bottleneck is likely going to be space overall and speed of graph initialization: Graphs will be initialized from potentially huge arrays such that each element in the array will end up being a vertex with a directed edge to one of its neighboring elements. To get the edges for each vertex, all its neighboring elements must be compared first.
My questions are: (1) Which representation is typically faster to initialize and also fast for BFS traversal, (2) What algorithms (other than vanilla BFS) are there for finding connected components? I know it's O(V+E) using BFS (which is optimal, I think) but I'm worried about the size of the intermediate queue as the graph width grows exponentially with height.
Don't have too much experience with graph implementations, so I'd be grateful for any suggestions.
Definition: A graph in which the number of edges is much less than the possible number of edges. graph. See also dense graph, complete graph, adjacency-list representation.
There are two established ways of implementing a graph: the adjacency matrix and the adjacency list.
A graph is a binary relation. It provides a powerful visualization as a set of points (called nodes) connected by lines (called edges) or arrows (called arcs). In this regard, the graph is a generalization of the tree data model.
A wide variety of network graphs happen to be sparse. But the index with which sparsity is commonly measured in network graphs is edge density, reflecting the proportion of the sum of the degrees of all nodes in the graph compared to the total possible degrees in the corresponding fully connected graph.
Consider a layout as follows:
An adjacency list can be implemented as an array of [Nx4] (n being 3 in this case, and 4 because you are saying that 4 is the maximum number of edges in your case) in the following form:
2 3 0 0
3 0 0 0
0 0 0 0
the above representation assumes that the number of vertices are in sorted order where first index into the array is given by (v-1)
.
Incidence list on the other hand, requires you to define a vertex list, an edge list and connection elements in between (incidence list - graph).
Both are good in terms of space usage compared to an adjacency matrix since your graph is very sparse, as you stated.
My suggestion would be to go with the adjacency list, which you can initialize as an [Nx4] contiguous array in the memory (since you are saying that you will have at most 4 edges for one vertex). This representation will be faster to initialize. (Also, this representation will perform better in terms of cache efficiency.)
However, if you expect the size of your graph changing dynamically and frequently, incidence lists might be better since they are generally implemented as lists which are non contiguous spaces (see the link above). De-allocation and allocation of the adjacency array might be undesirable in that case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With