Processing wikipedia dump file

Question

I want to process wikipedia dump file. In other meaning i want to extract title , category and text content for each article. what i want to ask about it is there any java api/tool that can help me in doing that. thanks in advance

Greg Hewgill · Accepted Answer

The Wikipedia dump file is in XML format. Therefore, you can use any available XML tools for this purpose.

Note that due to the size of the dump file, a SAX parser will generally be much more efficient than a DOM parser (since a DOM parser will try to load the entire thing into a memory representation).

Processing wikipedia dump file

Tags:

java

user1212009

1 Answers

Greg Hewgill

Recent Activity

Donate For Us

Processing wikipedia dump file

Tags:

java

user1212009

1 Answers

Greg Hewgill

Related questions

Recent Activity

Donate For Us