For example using this Wikipedia dump: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm Is there an existing library for Python that I can use to create an array with the mapping of subjects and values? For example: <pre class="prettyprint"><code>{height_ft,6},{nationality, American} </code></pre>

Just stumbled over a library on PyPi, wikidump, that claims to provide <blockquote> Tools to manipulate and extract data from wikipedia dumps </blockquote> I didn't use it yet, so you are on your own to try it...

Parsing a Wikipedia dump

Tags:

python

wikipedia-api

mediawiki-api

mediawiki

wikimedia-dumps

For example using this Wikipedia dump:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm

Is there an existing library for Python that I can use to create an array with the mapping of subjects and values?

For example:

Click to copy

{height_ft,6},{nationality, American}

816

asked Aug 11 '10 22:08

tomwu

2 Answers

It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib. You can use python's built-in XML packages to extract the page content from the API's response, then pass that content into mwlib's parser to produce an object representation that you can browse and analyse in code to extract the information you want. mwlib is BSD licensed.

161

answered Oct 04 '22 10:10

chaos95

Just stumbled over a library on PyPi, wikidump, that claims to provide

Tools to manipulate and extract data from wikipedia dumps

I didn't use it yet, so you are on your own to try it...

answered Oct 04 '22 09:10

PhilS

Related questions
                            
                                Why does str(KeyError) add extra quotes?
                            
                                Pandas Equivalent of R's which()
                            
                                How to count top 10 most common values in a dict in python
                            
                                geodesic distance transform in python
                            
                                How does this Python 3 quine work?
                            
                                Python: List of lists to dictionary [closed]
                            
                                Flask app get "IOError: [Errno 32] Broken pipe"
                            
                                Why are mutable values in Python Enums the same object?
                            
                                Tensorflow Queues - Switching between train and validation data
                            
                                Equivalent of copyTo in Python OpenCV bindings?
                            
                                When to use multiindexing vs. xarray in pandas
                            
                                Pandas: Replacement for .ix
                            
                                Python 3.6 urllib TypeError: can't concat bytes to str
                            
                                Pandas: conditional shift
                            
                                how to merge two dataframes and sum the values of columns
                            
                                Why is the first element in python's sys.path an empty string?
                            
                                Downloading mutliple stocks at once from yahoo finance python
                            
                                Pandas: `item` has been deprecated
                            
                                Saving Keras models with Custom Layers
                            
                                Are Python 2.5 .pyc files compatible with Python 2.6 .pyc files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With