Extract the first paragraph from a Wikipedia article (Python)

Tags:

wikipedia

How can I extract the first paragraph from a Wikipedia article, using Python?

For example, for Albert Einstein, that would be:

Albert Einstein (pronounced /ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics.[2] He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".[3]

939

asked Dec 16 '10 12:12

2 Answers

I wrote a Python library that aims to make this very easy. Check it out at Github.

To install it, run

$ pip install wikipedia

Then to get the first paragraph of an article, just use the wikipedia.summary function.

>>> import wikipedia >>> print wikipedia.summary("Albert Einstein", sentences=2)

prints

Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist who developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). While best known for his mass–energy equivalence formula E = mc2 (which has been dubbed "the world's most famous equation"), he received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".

As far as how it works, wikipedia makes a request to the Mobile Frontend Extension of the MediaWiki API, which returns mobile friendly versions of Wikipedia articles. To be specific, by passing the parameters prop=extracts&exsectionformat=plain, the MediaWiki servers will parse the Wikitext and return a plain text summary of the article you are requesting, up to and including the entire page text. It also accepts the parameters exchars and exsentences, which, not surprisingly, limit the number of characters and sentences returned by the API.

123

answered Sep 22 '22 01:09

goldsmith

Some time ago I made two classes for get Wikipedia articles in plain text. I know that they aren't the best solution, but you can adapt it to your needs:

wikipedia.py
wiki2plain.py

You can use it like this:

from wikipedia import Wikipedia from wiki2plain import Wiki2Plain  lang = 'simple' wiki = Wikipedia(lang)  try:     raw = wiki.article('Uruguay') except:     raw = None  if raw:     wiki2plain = Wiki2Plain(raw)     content = wiki2plain.text

answered Sep 21 '22 01:09

joksnet

Related questions
                            
                                PySpark create new column with mapping from a dict
                            
                                Django Rest-Framework nested serializer order
                            
                                How would I package and sell a Django app?
                            
                                Scrapy spider not found error
                            
                                Tensorflow Precision / Recall / F1 score and Confusion matrix
                            
                                Pattern matching of lists in Python
                            
                                How to get a list of all the fonts currently available for Matplotlib?
                            
                                How to enable MySQL client auto re-connect with MySQLdb?
                            
                                Key Listeners in python?
                            
                                How to display list of running processes Python?
                            
                                Python - Efficient way to add rows to dataframe
                            
                                Getting name of value from namedtuple
                            
                                Python for and if on one line
                            
                                Suggestions for Python debugging tools? [closed]
                            
                                SELECT * in SQLAlchemy?
                            
                                'Webdrivers' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
                            
                                How to delete columns in a CSV file?
                            
                                Invalid character in identifier
                            
                                split a generator/iterable every n items in python (splitEvery)
                            
                                Python: subplot within a loop: first panel appears in wrong position

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract the first paragraph from a Wikipedia article (Python)

Tags:

python

wikipedia

Alon Gubkin

People also ask

2 Answers

goldsmith

joksnet

Recent Activity

Donate For Us