I'm considering the following problem (very rough description):
Assume we have a graph where edges are assigned some non-negative costs, a starting node s
and some cost constant C
. Find out:
N
, reachable from s
where the cost of the shortest path from the starting node s
to any node in N
is not greater than C
.e
in the set the above - the cost of the shortest path.Basically Dijkstra with the cost constraint.
My primary question is: what is the correct terminology in the graph theory for this problem?
I've been looking at "accessibility" or "reachability" but these seem to be the wrong keywords.
Ultimately I'm looking for an algorithm which could efficiently answer many such queries on one (non-modifable) but quite large (potentially ~100 million edges) graph in parallel. I'd like to check literature but need hints on the right keywords.
Update: My practical problem is as follows.
Assume we're given a continental-size road network (modelled as a directed graph, with some properties on edges and nodes like if it's a pedestrian way or highway). Edge cost is travel time.
I'd like to answer user queries like: starting from some given position (graph node), which nodes are reachable within 1 hour?
I'd also like to answer these queries very fast (<200-300ms, if possible) for many users (>10000, if possible) in parallel.
I think there should be at least two optimizations possible:
I have a few ideas on my own but I'd first like to check the literature to avoid reinventing the wheel.
Two nodes X and Y are said to be reachable if we can start at X and end at Y using any number of edges. Note : A Node is reachable to itself.
Description. k = findnode( G , nodeID ) returns the numeric node ID, k , of the node in graph G whose name or index is nodeID . The numeric node ID is zero if the node is not in the graph.
Note: A node 'u' is said to be reachable from node 'v', if there exists a path between node' u' and 'v'.
The correct terminology to the problem you are dealing with is in the family of "shortest path algorithms". Reachability problems (i.e. Warshal) deal with the question "is there a path between nodes A and B?" and it has a binary answer, you don't need weights in this case, you only look for edges. But in your case you need to take into consideration the time it takes to travel from node A to node B in every case.
Now, there is no "exact" fit for this problems because a small change on Dijktra's, Floyd's, BFS or DFS can be used to solve this problem, but you have an added complexity because of the size of the graph this is important to understand how to build your solution. Which algorithm to use depends on your specific combination of time constraints and available hardware you have.
There are 2 algorithmic solutions to your problem (I'm assuming you already have all the edges stored somewhere and that you can query this database somehow):
In an ideal (imaginary) world you would run a Floyd's algorithm, save the resulting matrix in something like Redis and that's it, you can serve your requests in less than 10 ms, if the number of clients grow you can add more redis servers as needed since the graph is not likely to change very often (because in your specific problem you have road information). The problem with this is the space complexity of the solution, the best thing about it is that everything is precomputed, which means your response time to requests is minimal. To be able to implement some variation of this you need a lot of space, even a redis cluster with a DB stored on disk (yes disk, not memory) and all the servers with SSDs should be enough, this option scales well when the number of concurrent clients grow and the time response is quite fast. But whether or not you can use this solution depends on the number of nodes in your graph: even tho you need to precompute the distances using each edge you only need space to store a matrix of NxN being N the number of nodes in your graph, if this matrix fits in redis then you can use this algorithm and precompute all the "distances" (in your case this is the sum of the costs a.k.a "travel times") between all the nodes. Later on when you receive a request you need to query the resulting matrix to fetch all the nodes with a travel time less than a given value, there is an additional optimization when storing this matrix in redis than can allow you to fetch the node numbers quite fast using sorted sets.
Then you have the second solution which is to modify Dijktra, BFS or DFS to prune the search based on the cost and that's it. In this scenario you have already discarded Floyd because of the high space complexity, which means that your graph is rather huge both in nodes and edges. In this case the solution is almost the same using a variation of any of the algorithms, which makes the difference is how are you storing the graph. All 3 algorithms can be modified to efficiently respond in the time you want but to support over 10k clients the database you use to store the graph makes the difference. And here you have another 2 options:
Hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With