Is there a huge CSV/XML or whatever file somewhere that contains a list of english verbs and their variations (e.g sell -> sold, sale, selling, seller, sellee)?
I imagine this will be useful for NLP systems, but there doesn't seem to be a listing anywhere, or it could be my terrible googling skills. Does anybody have a clue otherwise?
Consider Catvar:
A Categorial-Variation Database (or Catvar) is a database of clusters of uninflected words (lexemes) and their categorial (i.e. part-of-speech) variants. For example, the words hunger(V), hunger(N), hungry(AJ) and hungriness(N) are different English variants of some underlying concept describing the state of being hungry. Another example is the developing cluster:(develop(V), developer(N), developed(AJ), developing(N), developing(AJ), development(N)).
I am not sure what you are looking for but I think WordNet
-- a lexical database for the English language -- would be a good place to start. Read more at http://wordnet.princeton.edu/
The link I referred to you says that
WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
Considering getting a dump of wiktionary and extracting this information out of it.
http://en.wiktionary.org/wiki/sell mentions many of the forms of the word (sells, selling, sold).
If your aim is simply to normalize words to some base canonical form, considering using a lemmatizer or stemmer. Trying playing with morpha which is a really good english lemmatizer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With