I am trying to do some research about chinese persons by using wiki data. Other than using dbpedia (as info about chinese person is bit limited comparing to zh.wikipedia.org), I found that I can download directly from zhwiki http://download.wikipedia.com/zhwiki/20150301/.
I see there is an index file, from the file I can see row such as: 966576:291:人物
Which I assume is a lookup key? Can someone tell me how to use this lookup key to search the main file or database?
You can either download the dumps from https://dumps.wikimedia.org/enwiki/ and parse them locally, or you can also contact the API. If you want to parse the dumps, https://jamesthorne.co.uk/blog/processing-wikipedia-in-a-couple-of-hours/ is a good article that shows how one could do that.
Browse to the page you want to download. Make sure you have Desktop view selected. Mobile devices which default to the Mobile view do not display the required options; to switch to Desktop view, scroll to the bottom of the page and select Desktop . In the left sidebar, under Print/export select Download as PDF .
Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance).
There are two files
index file has lines
offset is starting offset of bz2 stream. You need to read bytes from offset1 to offset2 from bz2 file and pass them to bz2 decoder and it will give you xml dump of 100 pages from that stream
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With