Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph theory - learn cost function to find optimal path

This is a supervised learning problem.

I have a directed acyclic graph (DAG). Each edge has a vector of features X, and each node (vertex) has a label 0 or 1. The task is to find a cost function w(X), so that the shortest path between any pair of nodes has the highest ratio of 1s to 0s (minimum classification error).

The solution must generalize well. I tried logistic regression, and the learned logistic function predicts fairly well the label of a node giving the features of a incoming edge. However, the graph's topology is not taken into account by that approach, so the solution in the whole graph is non-optimal. In other words, the logistic function is not a good weight function given the problem setup above.

Although my problem setup is not the typical binary classification problem setup, here is a good intro to it: http://en.wikipedia.org/wiki/Supervised_learning#How_supervised_learning_algorithms_work

Here are some more details:

  1. Each feature vector X is a d-dimensional list of real numbers.
  2. Each edge has a vector of features. That is, given the set of edges E = {e1, e2, .. en} and set of feature vectors F = {X1, X2 ... Xn}, then edge ei is associated to vector Xi.
  3. It is possible to come up with a function f(X), so that f(Xi) gives the likelihood that edge ei points to a node labeled with a 1. An example of such function is the one I mentioned above, found through logistic regression. However, as I mentioned above, such function is non-optimal.

SO THE QUESTION IS: Given the graph, a starting node and an finish node, how do I learn the optimal cost function w(X), so that the ratio of nodes 1s to 0s is maximized (minimum classification error)?

like image 635
Diego Avatar asked Nov 04 '12 17:11

Diego


3 Answers

This is not really an answer, but we need to clarify the question. I might come back later for a possible answer though.

Below is an example DAG.

enter image description here

Suppose the red node is the starting node, and the yellow one is the end node. How do you define the shortest path in terms of

the highest ratio of 1s to 0s (minimum classification error) ?

Edit: I add names for each node and two example names for the top two edges.

It seems to me you cannot learn such a cost function that takes feature vectors as inputs and whose output (edge weights? or whatever) can guide you to take a shortest path toward any node considering the graph topology. The reason is stated below:

  • Let's assume you don't have the feature vectors you stated. Given a graph as above, if you want to find all-pair-shortest-path with respective to the ratio of 1s to 0s, it's perfect to use Bellman equation or more specifically Dijkastra plus a proper heuristic function (e.g., percentage of 1s in the path). Another possible model-free approach is to use q-learning in which we get reward +1 for visiting a 1 node and -1 for visiting a 0 node. We learn a lookup q-table for each target node one at a time. Finally we have the all-pair-shortest-path when all nodes are treated as target nodes.

  • Now suppose, you magically obtained the feature vectors. Since you are able to find the optimal solution without those vectors, how come they will help when they exist?

  • There is one possible condition that you can use the feature vector to learn a cost function which optimize edge weights, that is, the feature vectors are dependent on the graph topology (the links between nodes and the position of 1s and 0s). But I did not see this dependency in your description at all. So I guess it does not exist.

like image 129
greeness Avatar answered Oct 06 '22 16:10

greeness


This looks like a problem where a genetic algorithm has excellent potential. If you define the desired function as e.g. (but not limited to) a linear combination of the features (you could add quadratic terms, then cubic, ad inifititum), then the gene is the vector of coefficients. The mutator can be just a random offset of one or more coefficients within a reasonable range. The evaluation function is just the average ratio of 1's to 0's along shortest paths for all pairs according to the current mutation. At each generation, pick the best few genes as ancestors and mutate to form the next generation. Repeat until the ueber gene is at hand.

like image 44
Gene Avatar answered Oct 06 '22 15:10

Gene


I believe your question is very close to the field of Inverse Reinforcement Learning, where you take certain "expert demonstrations" of optimal paths and try to learn a cost function such that your planner (A* or some reinforcement learning agent) outputs the same path as the expert demonstration. This training is done in an iterative way. I think that in your case, the expert demonstrations could be created by you to be paths that go through maximum number of 1 labelled edges. Here is a link to a good paper on the same: Learning to Search: Functional Gradient Techniques for Imitation Learning. It is from the robotics community where motion planning is usually setup as a graph-search problem and learning cost functions is essential for demonstrating desired behavior.

like image 1
mohakbh Avatar answered Oct 06 '22 16:10

mohakbh