Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find shortest path between two articles in english Wikipedia in Python

The question:

Find shortest path between two articles in english Wikipedia. Path between article A and B exist if there are articles C(i) and there is a link in article A that leads to article C(1), in article C(1) link that leads to article C(2), ..., in article C(n) is link that leads to article B

I'm using Python. URL to download wikipedia article:

  1. http://en.wikipedia.org/wiki/Nazwa_artykułu
  2. http://en.wikipedia.org/w/index.php?title?Nazwa_artykułu&printable=yes
  3. Wikipedia API

I have edited my source code, but it still does not work when I include those articles in codes can any one tell me what am I messing here?

This is my code:

import urllib2
import re
import xml.etree.ElementTree as ET

text = ET.fromstring(F_D.text.encode('UTF-8'))
text = ET.fromstring(P.text.encode('UTF-8'))
F_D=requests.get('http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms')
P=requests.get('http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles')  
links = text.findall('.//*[@id=”mw-content-text”]/p/a')

links=E_D

E_D = graph_dict
E_D[start] = 0

for vertex in E_D:
    F_D[vertex] = E_D[vertex]
    if vertex == end: break

    for edge in graph[vertex]:
        path_distance = F_D[vertex] + graph[vertex][edge]
        if edge in F_D:
            if path_distance < F_D[edge]:
                #raise ValueError,
            elif edge not in E_D or path_distance < E_D[edge]:
                E_D[edge] = path_distance
                [edge] = vertex
return (F_D,P)

def Shortest_Path(graph,start,end):
  F_D,P = D_Algorithm(graph,start,end)
  path = []
  while 1:
    path.append(end)
    if end == start: break
    end = P[end]
  path.reverse()
  return path
like image 572
Jefferson X Masonic Avatar asked Apr 13 '13 17:04

Jefferson X Masonic


People also ask

What is the all pairs shortest path problem?

The all-pairs shortest path problem, in which we have to find shortest paths between every pair of vertices v, v' in the graph. These generalizations have significantly more efficient algorithms than the simplistic approach of running a single-pair shortest path algorithm on all relevant pairs of vertices. Algorithms [ edit ]

What is the shortest path problem in graph theory?

In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between two intersections on a road map may be modeled as a special case...

What is the fastest algorithm for finding the shortest path?

The algorithm with the fastest known query time is called hub labeling and is able to compute shortest path on the road networks of Europe or the US in a fraction of a microsecond. Other techniques that have been used are: For shortest path problems in computational geometry, see Euclidean shortest path .

How do you solve the shortest path problem?

The all-pairs shortest paths problem for unweighted directed graphs was introduced by Shimbel (1953), who observed that it could be solved by a linear number of matrix multiplications that takes a total time of O(V4) . Thorup 1999 applied to every vertex (requires constant-time multiplication).


1 Answers

We are looking at graph exploration... why should you be considering Dijkstra's algorithm??? IMHO... change the approach.

First, you need a good heuristic function. For every node you expand, you need to geusstimate the distance of that node from the target/goal node. Now... how you compute the heuristic is the real challenge here. You may perhaps do a keyword mapping between the current wiki page and your destination page. A percentage of match may give you the estimate. Or... try to guess the relevance of content between the two pages. I have a hunch... perhaps a Neural Network may help you here. But, this may not indicate optimal estimate either. I'm not sure. Once you figure out a suitable way of doing this, use A* search algorithm.

Search and explore the heuristic function, do not go for breadth first search, you'll end up no where in the vast wide world of wikipedia!

like image 139
metsburg Avatar answered Sep 22 '22 13:09

metsburg