Is any place I can download Treebank of English phrases for free or less than $100? I need training data containing bunch of syntactic parsed sentences (>1000) in English in any format. Basically all I need is just words in this sentences being recognized by part of speech.
1. It contains 36 POS tags and 12 other tags (for punctuation and currency symbols). ...
Applications. From a computational linguistics perspective, treebanks have been used to engineer state-of-the-art natural language processing systems such as part-of-speech taggers, parsers, semantic analyzers and machine translation systems. Most computational systems utilize gold-standard treebank data.
The Penn Treebank, or PTB for short, is a dataset maintained by the University of Pennsylvania. It is huge — there are over four million and eight hundred thousand annotated words in it, all corrected by humans.
Here are a couple (English) treebanks available for free:
American National Corpus: MASC
Questions: QuestionBank and Stanford's corrections
British news: BNC
TED talks: NAIST-NTT TED Treebank
Georgetown University Multilayer Corpus: GUM
Biomedical:
NaCTeM GENIA treebank
Brown GENIA treebank
CRAFT corpus
See also Wikipedia for a huge list.
NLTK (for Python) offers several treebanks for free.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With