The question:
Find shortest path between two articles in english Wikipedia. Path between article A and B exist if there are articles C(i) and there is a link in article A that leads to article C(1), in article C(1) link that leads to article C(2), ..., in article C(n) is link that leads to article B
I'm using Python. URL to download wikipedia article:
I have edited my source code, but it still does not work when I include those articles in codes can any one tell me what am I messing here?
This is my code:
import urllib2
import re
import xml.etree.ElementTree as ET
text = ET.fromstring(F_D.text.encode('UTF-8'))
text = ET.fromstring(P.text.encode('UTF-8'))
F_D=requests.get('http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms')
P=requests.get('http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles')
links = text.findall('.//*[@id=”mw-content-text”]/p/a')
links=E_D
E_D = graph_dict
E_D[start] = 0
for vertex in E_D:
F_D[vertex] = E_D[vertex]
if vertex == end: break
for edge in graph[vertex]:
path_distance = F_D[vertex] + graph[vertex][edge]
if edge in F_D:
if path_distance < F_D[edge]:
#raise ValueError,
elif edge not in E_D or path_distance < E_D[edge]:
E_D[edge] = path_distance
[edge] = vertex
return (F_D,P)
def Shortest_Path(graph,start,end):
F_D,P = D_Algorithm(graph,start,end)
path = []
while 1:
path.append(end)
if end == start: break
end = P[end]
path.reverse()
return path
The all-pairs shortest path problem, in which we have to find shortest paths between every pair of vertices v, v' in the graph. These generalizations have significantly more efficient algorithms than the simplistic approach of running a single-pair shortest path algorithm on all relevant pairs of vertices. Algorithms [ edit ]
In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between two intersections on a road map may be modeled as a special case...
The algorithm with the fastest known query time is called hub labeling and is able to compute shortest path on the road networks of Europe or the US in a fraction of a microsecond. Other techniques that have been used are: For shortest path problems in computational geometry, see Euclidean shortest path .
The all-pairs shortest paths problem for unweighted directed graphs was introduced by Shimbel (1953), who observed that it could be solved by a linear number of matrix multiplications that takes a total time of O(V4) . Thorup 1999 applied to every vertex (requires constant-time multiplication).
We are looking at graph exploration... why should you be considering Dijkstra's algorithm??? IMHO... change the approach.
First, you need a good heuristic function. For every node you expand, you need to geusstimate the distance of that node from the target/goal node. Now... how you compute the heuristic is the real challenge here. You may perhaps do a keyword mapping between the current wiki page and your destination page. A percentage of match may give you the estimate. Or... try to guess the relevance of content between the two pages. I have a hunch... perhaps a Neural Network may help you here. But, this may not indicate optimal estimate either. I'm not sure. Once you figure out a suitable way of doing this, use A* search algorithm.
Search and explore the heuristic function, do not go for breadth first search, you'll end up no where in the vast wide world of wikipedia!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With