Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Nouns and Verbs from Text

Tags:

r

I was wondering if it is possible to extract nouns, verbs separately in R package openNLP? I use the the tagPOS function which tags the sentence but what to do in case I want to extract verbs, nouns separately.

like image 893
Shreyas Karnik Avatar asked Jun 04 '10 00:06

Shreyas Karnik


People also ask

What is the easiest way to identify a noun phrase?

Noun phrases consist of a noun and all of its modifiers. Modifiers can include adjectives, articles, participles, or possessive nouns and pronouns, just to name a few. Noun phrases can function as any noun in the sentence, whether as subjects, objects, or subject complements.

What is NLTK Ne_chunk?

chunk package. Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”.


1 Answers

Using an example: (this is to extract words tagged as /VBx, where x is any single character)

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."

acqTag <- tagPOS(acq)

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))

     [,1]                           
[1,] "said"                         
[2,] "sold"                         
[3,] "engaged"                      
[4,] "said"                         
[5,] "is"                           
[6,] "did"                          
[7,] " not/RB explain./NN Reuter./."

Ok, my regular expression needs some improvement in order to get rid of the last line in the result.

EDIT

An alternative could be to ignore rows containing a space character

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )
like image 152
George Dontas Avatar answered Nov 19 '22 22:11

George Dontas