There might be similar questions but I still have some parts that I couldn't figure out. I'm trying represent an undirected graph with no weights but just 1 for connected and 0 for not connected. I'm trying to represent a graph (reading from a file) which have 80500 nodes and over 5.5 million edges. I was wondering;
'large graphs' are graphs the exploration of which require long computation times on a typical quad-core machine compared to what people judge reasonable (an hour, a day.
A graph consists of a set of nodes or vertices together with a set of edges or arcs where each edge joins two vertices. Unless otherwise specified, a graph is undirected: each edge is an unordered pair {u,v} of vertices, and we don't regard either of the two vertices as having a distinct role from the other.
A graph can be represented using 3 data structures- adjacency matrix, adjacency list and adjacency set.
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to represent weighted graphs.
Adjacency lists are wayyyy more better space-wise. Because then you just need to save 5.5 million * 2 numbers = 11 000 000 integers. Assuming you save short integers (2 bytes), then you need 22 000 000 bytes.
If you represent it using adjacency matrix, then you need to save 80500 * 80500 = 6 480 250 000 elements. Even is you save them as bytes, having 22 million bytes is much better than having over 6 billion of them.
EDIT: If you save eges as two 4-byte integers, then you have 44 000 000 bytes. If you save the matrix very efficiently with bit fiddling, then you can save 8 elements in one byte. But is means you still need to have 810 031 250 bytes. Not that large difference now, but it's still 20 times more.
If your data is not sparse then you may not get as much space savings with an adjacency list. You could use an adjacency matrix with compressed or encoded rows (or columns, but your graph is unidirected, so compressing rows is probably more natural for lookups). With compression you will reduce space, at the time cost of decompressing rows on lookup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With