Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negating sentences using POS-tagging

Tags:

regex

php

nlp

I'm trying to find a way to negate sentences based on POS-tagging. Please consider:

include_once 'class.postagger.php';

function negate($sentence) {  
  $tagger = new PosTagger('includes/lexicon.txt');
  $tags = $tagger->tag($sentence);
  foreach ($tags as $t) {
    $input[] = trim($t['token']) . "/" . trim($t['tag']) .  " ";
  }
  $sentence = implode(" ", $input);
  $postagged = $sentence;

  // Concatenate "not" to every JJ, RB or VB
  // Todo: ignore negative words (not, never, neither)
  $sentence = preg_replace("/(\w+)\/(JJ|MD|RB|VB|VBD|VBN)\b/", "not$1/$2", $sentence);

  // Remove all POS tags
  $sentence = preg_replace("/\/[A-Z$]+/", "", $sentence);

  return "$postagged<br>$sentence";
}

BTW: In this example, I'm using the POS-tagging implementation and lexicon of Ian Barber. An example of this code running would be:

echo negate("I will never go to their place again");
I/NN will/MD never/RB go/VB to/TO their/PRP$ place/NN again/RB 
I notwill notnever notgo to their place notagain

As you can see, (and this issue is also commented in the code), negating words themselves are being negated as wel: never becomes notnever, which obviously shouldn't happen. Since my regex skills aren't all that, is there a way to exclude these words from the regex used?

[edit] Also, I would very much welcome other comments / critiques you might have in this negating implementation, since I'm sure it's (still) quite flawed :-)

like image 202
Pr0no Avatar asked May 01 '12 13:05

Pr0no


People also ask

What is POS tagging in grammar?

It is generally called POS tagging. In simple words, we can say that POS tagging is a task of labelling each word in a sentence with its appropriate part of speech. We already know that parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories.

What is POS tagging in NLTK?

POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.

How accurate are POS tags in NLP?

POS tagging is a fundamental problem in NLP. There are many NLP tasks based on POS tags. Most good POS taggers report accuracy numbers of 97% and above on a per word (aka token) basis. Some scholars, however, have argued that the per token accuracy is not the best way to estimate the accuracy of the POS engine.

What are the steps involved in POS tagging?

Steps Involved in the POS tagging example: Abbreviation Meaning CC coordinating conjunction CD cardinal digit DT determiner EX existential there 30 more rows ...


1 Answers

Give this a try:

$sentence = preg_replace("/(\s)(?:(?!never|neither|not)(\w*))\/(JJ|MD|RB|VB|VBD|VBN)\b/", "$1not$2", $sentence);
like image 110
Nate Avatar answered Oct 11 '22 18:10

Nate