Find shortest path between two articles in english Wikipedia in Python

Tags:

The question:

Find shortest path between two articles in english Wikipedia. Path between article A and B exist if there are articles C(i) and there is a link in article A that leads to article C(1), in article C(1) link that leads to article C(2), ..., in article C(n) is link that leads to article B

I'm using Python. URL to download wikipedia article:

http://en.wikipedia.org/wiki/Nazwa_artykułu
http://en.wikipedia.org/w/index.php?title?Nazwa_artykułu&printable=yes
Wikipedia API

I have edited my source code, but it still does not work when I include those articles in codes can any one tell me what am I messing here?

This is my code:

import urllib2
import re
import xml.etree.ElementTree as ET

text = ET.fromstring(F_D.text.encode('UTF-8'))
text = ET.fromstring(P.text.encode('UTF-8'))
F_D=requests.get('http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms')
P=requests.get('http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles')  
links = text.findall('.//*[@id=”mw-content-text”]/p/a')

links=E_D

E_D = graph_dict
E_D[start] = 0

for vertex in E_D:
    F_D[vertex] = E_D[vertex]
    if vertex == end: break

    for edge in graph[vertex]:
        path_distance = F_D[vertex] + graph[vertex][edge]
        if edge in F_D:
            if path_distance < F_D[edge]:
                #raise ValueError,
            elif edge not in E_D or path_distance < E_D[edge]:
                E_D[edge] = path_distance
                [edge] = vertex
return (F_D,P)

def Shortest_Path(graph,start,end):
  F_D,P = D_Algorithm(graph,start,end)
  path = []
  while 1:
    path.append(end)
    if end == start: break
    end = P[end]
  path.reverse()
  return path

572

asked Apr 13 '13 17:04

Jefferson X Masonic

1 Answers

We are looking at graph exploration... why should you be considering Dijkstra's algorithm??? IMHO... change the approach.

First, you need a good heuristic function. For every node you expand, you need to geusstimate the distance of that node from the target/goal node. Now... how you compute the heuristic is the real challenge here. You may perhaps do a keyword mapping between the current wiki page and your destination page. A percentage of match may give you the estimate. Or... try to guess the relevance of content between the two pages. I have a hunch... perhaps a Neural Network may help you here. But, this may not indicate optimal estimate either. I'm not sure. Once you figure out a suitable way of doing this, use A* search algorithm.

Search and explore the heuristic function, do not go for breadth first search, you'll end up no where in the vast wide world of wikipedia!

139

answered Sep 22 '22 13:09

metsburg

Related questions
                            
                                Python - Properly Kill/Exit Futures Thread?
                            
                                Add Title to Seaborn Cat plot - Python [duplicate]
                            
                                Why does isinstance require a tuple instead of any iterable? [duplicate]
                            
                                Setting up a PyCharm remote conda interpreter
                            
                                Implementing OData JSON interface on Django (Python)
                            
                                some Numpy functions return ndarray instead of my subclass
                            
                                Start Python Celery task via Redis Pub/Sub
                            
                                Python: how to stream/pipe data out of gzip compression?
                            
                                How to specify docstring for __init__ in Python C extension
                            
                                Man in the middle attack with scapy
                            
                                Database Mapping World Cities to Metropolitan Statistical Areas (and non-US MSA equivalents)
                            
                                Django handler500 as a Class Based View
                            
                                Retrying failed jobs in RQ
                            
                                How come these Python codes perform so much differently
                            
                                Python's OpenCV cv2.imread always returns None and cvFeatDetector crashes python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find shortest path between two articles in english Wikipedia in Python

Tags:

python

algorithm

dijkstra

Jefferson X Masonic

People also ask

1 Answers

metsburg

Recent Activity

Donate For Us