How to use OpenNLP to get POS tags in R?

Tags:

Here is the R Code:

library(NLP) 
library(openNLP)
tagPOS <-  function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)}
str <- "this is a the first sentence."
tagged_str <-  tagPOS(str)

Output is :

tagged_str $POStagged [1]"this/DT is/VBZ a/DT the/DT first/JJ sentence/NN ./."

Now I want to extract only NN word i.e sentence from the above sentence and want to store it into a variable .Can anyone help me out with this .

974

asked Jun 23 '15 06:06

user4599

1 Answers

Here is a more general solution, where you can describe the Treebank tag you desire to extract using a regular expression. In your case for instance, "NN" returns all noun types (e.g. NN, NNS, NNP, NNPS) while "NN$" returns just NN.

It operates on a character type, so if you have your texts as a list, you will need to lapply() it as in the examples below.

txt <- c("This is a short tagging example, by John Doe.",
         "Too bad OpenNLP is so slow on large texts.")

extractPOS <- function(x, thisPOSregex) {
    x <- as.String(x)
    wordAnnotation <- annotate(x, list(Maxent_Sent_Token_Annotator(), Maxent_Word_Token_Annotator()))
    POSAnnotation <- annotate(x, Maxent_POS_Tag_Annotator(), wordAnnotation)
    POSwords <- subset(POSAnnotation, type == "word")
    tags <- sapply(POSwords$features, '[[', "POS")
    thisPOSindex <- grep(thisPOSregex, tags)
    tokenizedAndTagged <- sprintf("%s/%s", x[POSwords][thisPOSindex], tags[thisPOSindex])
    untokenizedAndTagged <- paste(tokenizedAndTagged, collapse = " ")
    untokenizedAndTagged
}

lapply(txt, extractPOS, "NN")
## [[1]]
## [1] "tagging/NN example/NN John/NNP Doe/NNP"
## 
## [[2]]
## [1] "OpenNLP/NNP texts/NNS"
lapply(txt, extractPOS, "NN$")
## [[1]]
## [1] "tagging/NN example/NN"
## 
## [[2]]
## [1] ""

answered Oct 04 '22 08:10

Ken Benoit

Related questions
                            
                                get online data every hour in R
                            
                                roxygen2: Issue with exporting print method
                            
                                Sudden "unused argument" error
                            
                                Continuous colour of geom_line according to y value
                            
                                How to change plot title in R when the package already uses an existing title?
                            
                                How to perform lm.ridge summary?
                            
                                Get tick break positions in ggplot
                            
                                Splitting vector based on vector of chunk-lengths
                            
                                Reorganizing a unique (NYC MTA turnstile) dataset in R
                            
                                Error in R (mice package), too many weights
                            
                                How to source R code without overwriting current variables?
                            
                                How to speed up or vectorize a for loop?
                            
                                R: Convert list with different number of rows to data.frame
                            
                                How to convert vector of characters to corpus input for the DocumentTermMatrix function from tm package in R?
                            
                                ggplot2: More complex faceting
                            
                                Multiple duplicates (2 times, 3 times,...) in R
                            
                                apply multiple functions in sapply
                            
                                Changing values on one dataframe based on data in another dataframe
                            
                                Leaflet map legend in R Shiny app has doesn't show colors
                            
                                How do I put multiple boxplots in the same graph in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use OpenNLP to get POS tags in R?

Tags:

r

nlp

text-mining

pos-tagger

opennlp

user4599

People also ask

1 Answers

Ken Benoit

Recent Activity

Donate For Us