I currently have a graph that has about 10 million nodes and 35 million edges. For now the complete graph is loaded into memory at program start. This takes a couple of minutes (it is Java after all) and needs about half a gigabyte of RAM. For now it runs on a machine with a dual core processor and 4 gigabytes of RAM.
When the graph is searched using a breadth-first-search the memory usage rises to a peak of one gigabyte and it takes ten seconds on average.
I would like to deploy the program on a couple of computers. The functionality apart from the graph search does take very little resources. My target system is very miniature and has only 512 megabytes of RAM.
Any suggestions on how to implement a method (probably using a database) to search that graph without consuming too much memory? The program is idle most of the time as it is accessing a hardware device, so the path-finding could take about 5 minutes max for the mentioned graph...
Thanks for any thoughts thrown in my direction.
UPDATE:
Just found neo4j. Anybody knows if it would be suitable for this kind of humongous graph?
For implementation, BFS uses a queue data structure, while DFS uses a stack. BFS uses a larger amount of memory because it expands all children of a vertex and keeps them in memory. It stores the pointers to a level's child nodes while searching each level to remember where it should go when it reaches a leaf node.
You can speed up BFS from a source to a target by doing a bi-directional search. A bi-directional search is basically doing a BFS from the source and from the target at the same time, one step from each - until the two fronts meet each other.
Breadth-first search is a graph traversal algorithm that starts traversing the graph from the root node and explores all the neighboring nodes. Then, it selects the nearest node and explores all the unexplored nodes. While using BFS for traversal, any node in the graph can be considered as the root node.
Breadth First Search (BFS) algorithm traverses a graph in a breadthward motion and uses a queue to remember to get the next vertex to start a search, when a dead end occurs in any iteration. As in the example given above, BFS algorithm traverses from A to B to E to F first then to C and G lastly to D.
Your question is a little vague, but in general, a good strategy that mostly follows breadth first semantics while using the same amount of memory as depth-first search is Iterative Deepening. The idea is that you do a depth-first search limited to 1 level at first; if that fails to find a solution, start from scratch and limit it to 2 levels; if that fails, try 3 levels, and so on.
This may seem a bit redundant at first, but since you're doing a depth-first search, you keep much fewer nodes in memory, and always search one less level than a straightforward breadth-first search. Since the amount of nodes in a level grows exponentially, on larger graphs, it's very likely that saving that one last extra level pays off for trying all preceding layers redundantly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With