Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Check if Wikipedia Article Exists

Tags:

python

I'm trying to figure out how to check if a Wikipedia article exists. For example,

https://en.wikipedia.org/wiki/Food

exists, however

https://en.wikipedia.org/wiki/Fod 

does not, and the page simply says, "Wikipedia does not have an article with this exact name."

Thanks!

like image 727
John Avatar asked Jul 24 '15 09:07

John


People also ask

What is Wikipedia in Python?

Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia. Search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it. >>> import wikipedia >>> print wikipedia.

Which attribute of a Wikipedia page object will have the HTML contents of a Wikipedia page?

To get the complete plain text content of a Wikipedia page (excluding images, tables, etc.), we can use the content attribute of the page object.

How do I find my Wikipedia page ID?

In the desktop view of Wikipedia, in the default skin and most others, the left-hand panel has a "Wikidata item" link, under " tools ". Copy the URL of that link, paste it into a text editor, and read (or copy) the ID from it.


2 Answers

>>> import urllib
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Food").getcode()
200
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
404

is it ok?

or

>>> a = urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
>>> if a == 404:
...     print "Wikipedia does not have an article with this exact name."
...
Wikipedia does not have an article with this exact name.
like image 195
Kostia Skrypnyk Avatar answered Nov 15 '22 09:11

Kostia Skrypnyk


Basicly, most website or web service will announce some status from each your HTTP request in the HTTP response header.
In your case, you can simply find the status code if is 404 while the article is not existed even though your brower rendered a page like a normol result.

import request
result = request.get('https://en.wikipedia.org/wiki/Food')
if result.status_code == 200:  # the article exists
    pass  # blablabla
like image 21
Ginhing Avatar answered Nov 15 '22 09:11

Ginhing