Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Natural language parsing, practical example

Tags:

java

nlp

I am looking to use a natural language parsing library for a simple chat bot. I can get the Parts of Speech tags, but I always wonder. What do you do with the POS. If I know the parts of the speech, what then?

I guess it would help with the responses. But what data structures and architecture could I use.

like image 936
Berlin Brown Avatar asked Dec 05 '22 06:12

Berlin Brown


1 Answers

A part-of-speech tagger assigns labels to the words in the input text. For example, the popular Penn Treebank tagset has some 40 labels, such as "plural noun", "comparative adjective", "past tense verb", etc. The tagger also resolves some ambiguity. For example, many English word forms can be either nouns or verbs, but in the context of other words, their part of speech is unambiguous. So, having annotated your text with POS tags you can answer questions like: how many nouns do I have?, how many sentences do not contain a verb?, etc.

For a chatbot, you obviously need much more than that. You need to figure out the subjects and objects in the text, and which verb (predicate) they attach to; you need to resolve anaphors (which individual does a he or she point to), what is the scope of negation and quantifiers (e.g. every, more than 3), etc.

Ideally, you need to map you input text into some logical representation (such as first-order logic), which would let you bring in reasoning to determine if two sentences are equivalent in meaning, or in an entailment relationship, etc.

While a POS-tagger would map the sentence

Mary likes no man who owns a cat.

to such a structure

Mary/NNP likes/VBZ no/DT man/NN who/WP owns/VBZ a/DT cat/NN ./.

you would rather need something like this:

SubClassOf(
   ObjectIntersectionOf(
      Class(:man)
      ObjectSomeValuesFrom(
         ObjectProperty(:own)
         Class(:cat)
      )
   )
   ObjectComplementOf(
      ObjectSomeValuesFrom(
         ObjectInverseOf(ObjectProperty(:like))
         ObjectOneOf(
            NamedIndividual(:Mary)
         )
      )
   )
)

Of course, while POS-taggers get precision and recall values close to 100%, more complex automatic processing will perform much worse.

A good Java library for NLP is LingPipe. It doesn't, however, go much beyond POS-tagging, chunking, and named entity recognition.

like image 147
Kaarel Avatar answered Dec 07 '22 20:12

Kaarel