I found a Python script (here: Wikipedia Extractor) that can generate plain text from (English) Wikipedia database dump. When I use this command (as it's stated on the script's page):
$ python enwiki-latest-pages-articles.xml WikiExtractor.py -b 500K -o extracted
I get this error:
File "enwiki-latest-pages-articles.xml", line 1 < mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
^
SyntaxError: invalid syntax
I'm executing the script using Python 2.7.6 & Cygwin on Windows 7.
I hope If anyone has already used this script or experience with Python can help me to solve this error.
Thanks in advance!
The first argument to python
should be the script name.
You probably need to swap xml
and py
file names:
$ python WikiExtractor.py enwiki-latest-pages-articles.xml -b 500K -o extracted
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With