Extract the main article text from a Wikipedia page using Python [closed]

Tags:

I've been searching for hours on how to extract the main text of a Wikipedia article, without all the links and references. I've tried wikitools, mwlib, BeautifulSoup and more. But I haven't really managed to.

Is there any easy and fast way for me to take the clear text (the actual article), and put it in a Python variable?

SOLUTION: Omid Raha solved it :)

958

asked Apr 28 '14 21:04

Paolo

1 Answers

You can use this package, that is a python wrapper for Wikipedia API,

Here is a quick start.

First install it:

pip install wikipedia

Example:

import wikipedia
p = wikipedia.page("Python programming language")
print(p.url)
print(p.title)
content = p.content # Content of page.

Output:

http://en.wikipedia.org/wiki/Python_(programming_language)
Python (programming language)

112

answered Sep 24 '22 01:09

Omid Raha

Related questions
                            
                                piping in shell via Python subprocess module
                            
                                Check how many elements from a list fall within a specified range (Python)
                            
                                writing tab separated values into a file
                            
                                Django:any way to remove this clear field?
                            
                                Suds ignoring proxy setting
                            
                                Python multiprocessing, passing an object reference containig a semaphore
                            
                                How to iterate over the elements of a map in python
                            
                                Python regex: Including whitespace inside character range
                            
                                How to extract audio with youtube-dl on Windows
                            
                                Python tuples sorting based on last element [duplicate]
                            
                                How to use csv.Sniffer for 2 different CSV-types?
                            
                                Python property decorator not working, why?
                            
                                Why Is the Output of My Range Function Not a List?
                            
                                Python can't multiply sequence by non-int of type 'float'
                            
                                Regex findall start() and end() ? Python
                            
                                Can I asynchronously delete a file in Python?
                            
                                what is flask-sqlalchemy where in clause query syntax?
                            
                                Itertools equivalent of nested loop "for x in xs: for y in ys..."
                            
                                ERROR: Test failed: 400 (InvalidToken): The provided token is malformed or otherwise invalid
                            
                                Gunicorn (Python3.4 and 3.3) sends in response only headers without data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract the main article text from a Wikipedia page using Python [closed]

Tags:

python

parsing

extract

wikipedia

Paolo

People also ask

1 Answers

Omid Raha

Recent Activity

Donate For Us