Disclaimer: The author is a newbie in Erlang.
Imagine, we have a graph consisting of 1M nodes, and each node has 0-4 neighbours (the edges are emanating from each node to those neighbours, so the graph is directed and connected).
Here is my choice of data structures:
To store the graph I use digraph, which is based on ETS tables. This allows fast (O(1)) access to the neighbours of a node.
For the list of unvisited nodes, I use gb_sets:take_smallest (the node is already sorted, and it is simultaneously deleted after fetching).
For the list of predecessors I use the dict structure, which allows to store the predecessors in the following way: {Node1,Node1_predecessor},{Node2,Node2_predecessor}.
For the list of visited nodes I use a simple list.
Problems:
UPD:
Here is the code based on the recommendations of Antonakos:
dijkstra(Graph,Start_node_name) ->
io:format("dijkstra/2: start~n"),
Paths = dict:new(),
io:format("dijkstra/2: initialized empty Paths~n"),
Unvisited = gb_sets:new(),
io:format("dijkstra/2: initialized empty Unvisited nodes priority queue~n"),
Unvisited_nodes = gb_sets:insert({0,Start_node_name,root},Unvisited),
io:format("dijkstra/2: Added start node ~w with the weight 0 to the Unvisited nodes: ~w~n", [Start_node_name, Unvisited_nodes]),
Paths_updated = loop_through_nodes(Graph,Paths,Unvisited_nodes),
io:format("dijkstra/2: Finished searching for shortest paths: ~w~n", [Paths_updated]).
loop_through_nodes(Graph,Paths,Unvisited_nodes) ->
%% We need this condition to stop looping through the Unvisited nodes if it is empty
case gb_sets:is_empty(Unvisited_nodes) of
false ->
{{Current_weight,Current_name,Previous_node}, Unvisited_nodes_updated} = gb_sets:take_smallest(Unvisited_nodes),
case dict:is_key(Current_name,Paths) of
false ->
io:format("loop_through_nodes: Found a new smallest unvisited node ~w~n",[Current_name]),
Paths_updated = dict:store(Current_name,{Previous_node,Current_weight},Paths),
io:format("loop_through_nodes: Updated Paths: ~w~n",[Paths_updated]),
Out_edges = digraph:out_edges(Graph,Current_name),
io:format("loop_through_nodes: Ready to iterate through the out edges of node ~w: ~w~n",[Current_name,Out_edges]),
Unvisited_nodes_updated_2 = loop_through_edges(Graph,Out_edges,Paths_updated,Unvisited_nodes_updated,Current_weight),
io:format("loop_through_nodes: Looped through out edges of the node ~w and updated Unvisited nodes: ~w~n",[Current_name,Unvisited_nodes_updated_2]),
loop_through_nodes(Graph,Paths_updated,Unvisited_nodes_updated_2);
true ->
loop_through_nodes(Graph,Paths,Unvisited_nodes_updated)
end;
true ->
Paths
end.
loop_through_edges(Graph,[],Paths,Unvisited_nodes,Current_weight) ->
io:format("loop_through_edges: No more out edges ~n"),
Unvisited_nodes;
loop_through_edges(Graph,Edges,Paths,Unvisited_nodes,Current_weight) ->
io:format("loop_through_edges: Start ~n"),
[Current_edge|Rest_edges] = Edges,
{Current_edge,Current_node,Neighbour_node,Edge_weight} = digraph:edge(Graph,Current_edge),
case dict:is_key(Neighbour_node,Paths) of
false ->
io:format("loop_through_edges: Inserting new neighbour node ~w into Unvisited nodes~n",[Current_node]),
Unvisited_nodes_updated = gb_sets:insert({Current_weight+Edge_weight,Neighbour_node,Current_node},Unvisited_nodes),
io:format("loop_through_edges: The unvisited nodes are: ~w~n",[Unvisited_nodes_updated]),
loop_through_edges(Graph,Rest_edges,Paths,Unvisited_nodes_updated,Current_weight);
true ->
loop_through_edges(Graph,Rest_edges,Paths,Unvisited_nodes,Current_weight)
end.
Explanation: Minimum priority queue is the most commonly used data structure for implementing Dijkstra's Algorithm because the required operations to be performed in Dijkstra's Algorithm match with specialty of a minimum priority queue.
According to this page, Dijkstra's algorithm is just BFS with a priority queue.
Running time of DFS is O(V + E), Dijkstra is O((V + E) log V). Noticed Dijkstra has log V added, it is the cost of adding to the heap, hence it is slower than DFS. Most people prefer Dijkstra to DFS in pathfinding because Dijkstra is so accurate. Well, Dijkstra finds the shortest path from the starting point.
Your choice of data structures looks OK, so it is mostly a question of what to insert in them and how to keep them up to date. I'd suggest the following (I have changed the names a bit):
Result
: A dict
mapping Node
to {Cost, Prev}
, where Cost
is the total cost of the path to Node
and Prev
is its predecessor on the path.
Open
: A gb_sets
structure of {Cost, Node, Prev}
.
A graph with edges of the form {EdgeCost, NextNode}
.
The result of the search is represented by Result
and the graph isn't updated at all. There is no multiprocessing or message passing.
The algorithm goes as follows:
Insert {0, StartNode, Nil}
in Open
, where Nil
is something that marks the end of the path.
Let {{Cost, Node, Prev}, Open1} = gb_sets:take_smallest(Open)
. If Node
is already in Result
then do nothing; otherwise add {Cost, Node, Prev}
to Result
, and for every edge {EdgeCost, NextNode}
of Node
add {Cost + EdgeCost, NextNode, Node}
to Open1
, but only if NextNode
isn't already in Result
. Continue with Open1
until the set is empty.
Dijkstra's algorithm asks for a decrease_key()
operation on the Open
set. Since this isn't readily supported by gb_sets
we have used the workaround of inserting a tuple for NextNode
even if NextNode
might be present in Open
already. That's why we check if the node extracted from Open
is already in Result
.
Extended discussion of the use of the priority queue
There are several ways of using a priority queue with Dijkstra's algorithm.
In the standard version of Wikipedia a node v
is inserted only once but the position of v
is updated when the cost and predecessor of v
is changed.
alt := dist[u] + dist_between(u, v)
if alt < dist[v]:
dist[v] := alt
previous[v] := u
decrease-key v in Q
Implementations often simplify by replacing decrease-key v in Q
with add v to Q
. This means that v
can be added more than once, and the algorithm must therefore check that an u
extracted from the queue hasn't already been added to the result.
In my version I am replacing the entire block above with add v to Q
. The queue will therefore contain even more entries, but since they are always extracted in order it doesn't affect the correctness of the algorithm. If you don't want these extra entries, you can use a dictionary to keep track of the minimum cost for each node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With