Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser for Wikipedia

I downloaded a Wikipedia dump and I want to convert the wiki format into my object format. Is there a wiki parser available that converts the object into XML?

like image 211
Boolean Avatar asked Oct 08 '10 06:10

Boolean


People also ask

What is parser used for?

Parsers are used when there is a need to represent input data from source code abstractly as a data structure so that it can be checked for the correct syntax. Coding languages and other technologies use parsing of some type for this purpose.

What is parser mean?

Definition of parser : one that parses specifically : a computer program that breaks down text into recognized strings of characters for further analysis.

What is parser and its types?

Parser is a compiler that is used to break the data into smaller elements coming from lexical analysis phase. A parser takes input in the form of sequence of tokens and produces output in the form of parse tree. Parsing is of two types: top down parsing and bottom up parsing.

What is parser in NLP?

A natural language parser is a program that figures out which group of words go together (as “phrases”) and which words are the subject or object of a verb. The NLP parser separates a series of text into smaller pieces based on the grammar rules. If a sentence that cannot be parsed may have grammatical errors.


1 Answers

See java-wikipedia-parser. I have never used it but according to the docs :

The parser comes with an HTML generator. You can however control the output that is being generated by passing your own implementation of the be.devijver.wikipedia.Visitor interface.

like image 64
dogbane Avatar answered Oct 13 '22 00:10

dogbane