Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

list of english verbs and their tenses, various forms, etc

Tags:

nlp

Is there a huge CSV/XML or whatever file somewhere that contains a list of english verbs and their variations (e.g sell -> sold, sale, selling, seller, sellee)?

I imagine this will be useful for NLP systems, but there doesn't seem to be a listing anywhere, or it could be my terrible googling skills. Does anybody have a clue otherwise?

like image 419
kamziro Avatar asked Dec 13 '12 05:12

kamziro


3 Answers

Consider Catvar:

A Categorial-Variation Database (or Catvar) is a database of clusters of uninflected words (lexemes) and their categorial (i.e. part-of-speech) variants. For example, the words hunger(V), hunger(N), hungry(AJ) and hungriness(N) are different English variants of some underlying concept describing the state of being hungry. Another example is the developing cluster:(develop(V), developer(N), developed(AJ), developing(N), developing(AJ), development(N)).

like image 181
Kenston Choi Avatar answered Oct 19 '22 19:10

Kenston Choi


I am not sure what you are looking for but I think WordNet -- a lexical database for the English language -- would be a good place to start. Read more at http://wordnet.princeton.edu/

The link I referred to you says that

WordNet's structure makes it a useful tool for computational linguistics and natural language processing.

like image 38
One Avatar answered Oct 19 '22 20:10

One


Considering getting a dump of wiktionary and extracting this information out of it.
http://en.wiktionary.org/wiki/sell mentions many of the forms of the word (sells, selling, sold).

If your aim is simply to normalize words to some base canonical form, considering using a lemmatizer or stemmer. Trying playing with morpha which is a really good english lemmatizer.

like image 1
Aditya Mukherji Avatar answered Oct 19 '22 19:10

Aditya Mukherji