Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for Determining Word Type using WordNet Database

I'm working on a project which requires scanning through paragraphs of natural text in English and detecting what type of word they are. The application works with AJAX, PHP, and MySQL.

My application doesn't need to be 100% accurate and simply tries to find the best content that matches text input. To do this I've used an SQL version of the WordNet database which allows me to search for words and their types as so using the dict view.

SELECT lemma, pos FROM dict WHERE lemma = 'fool' ORDER BY lemma;

The above is an example of what the database sees but my PHP actually creates dynamic bound parameters based on the text from the AJAX calls and in reality, will contain many keywords.

This will return an array of records with each word searched for and their type.

My problem however is that most words can be multiple types, for example, with the fool example, it brings back three as a noun, and four as a verb. The minute differences aren't needed for me but I would like to know if the word is a noun or a verb in it's usage.

This problem persists across most words which means I cannot accurately detect different types of words because it could be any of the uses.

I am wondering if anybody could point me in the right direction of an algorithm or what I may be able to do in order to give at the very least a best guess at what the word type is.

The ones most important to get right are adjectives and nouns.

like image 541
Arcana Avatar asked Jan 04 '15 18:01

Arcana


1 Answers

The task you are trying to accomplish is called part-of-speech tagging (as already suggested in the comments) and Wordnet is definitely NOT the tool to do it. Also in the comments there is a link to a very simple PHP approach to POS. There are many libraries for POS. The one linked in the comments implements the Brill parser, which is very simple and achieves good results. For better performance, I'd suggest using the Stanford NLP tools for which there are PHP interfaces, for example: https://github.com/agentile/PHP-Stanford-NLP

There are a couple related SO questions:

  • How to impliment a Part-of-Speech (POS) tagger

  • Pos Tagger in PHP

like image 58
Josep Valls Avatar answered Sep 18 '22 17:09

Josep Valls