I was wondering if it is possible to extract nouns, verbs separately in R package openNLP? I use the the tagPOS function which tags the sentence but what to do in case I want to extract verbs, nouns separately.
Noun phrases consist of a noun and all of its modifiers. Modifiers can include adjectives, articles, participles, or possessive nouns and pronouns, just to name a few. Noun phrases can function as any noun in the sentence, whether as subjects, objects, or subject complements.
chunk package. Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called “chunk parsing” or “chunking”, and the identified groups are called “chunks”.
Using an example: (this is to extract words tagged as /VBx, where x is any single character)
library("openNLP")
acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
acqTag <- tagPOS(acq)
sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))
[,1]
[1,] "said"
[2,] "sold"
[3,] "engaged"
[4,] "said"
[5,] "is"
[6,] "did"
[7,] " not/RB explain./NN Reuter./."
Ok, my regular expression needs some improvement in order to get rid of the last line in the result.
EDIT
An alternative could be to ignore rows containing a space
character
sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With