I am writing an R script and am using library(ngram).
Suppose I have a string,
"good qualiti dog food bought sever vital can dog food product found good qualiti product look like stew process meat smell better labrador finicki appreci product better"
and want to find bi-grams.
The ngram library is giving me bi-grams as follows:
"appreci product" "process meat" "food product" "food bought" "qualiti dog" "product found" "product look" "look like" "like stew" "good qualiti" "labrador finicki" "bought sever" "qualiti product" "better labrador" "dog food" "smell better" "vital can" "meat smell" "found good" "sever vital" "stew process" "can dog" "finicki appreci" "product better"
As the sentence contains "dog food" two times, I want this bi-gram two times. But I am getting it once!
Is there an option in thengram library or any other library that gives all the bi-grams of my sentence in R?
A pair of words is called a “bigram”. More generally, a token comprising n words is called an “n-gram” (or “ngram”). Tokenising on bigrams or n-grams enable you to capture examine the correlations, and more importantly, the immediate context around each word.
An n-gram is a collection of n successive items in a text document that may include words, numbers, symbols, and punctuation. N-gram models are useful in many text analytics applications where sequences of words are relevant, such as in sentiment analysis, text classification, and text generation.
N-grams are continuous sequences of words or symbols or tokens in a document. In technical terms, they can be defined as the neighbouring sequences of items in a document. They come into play when we deal with text data in NLP(Natural Language Processing) tasks.
An n-gram is simply any sequence of n tokens (words). Consequently, given the following review text - “Absolutely wonderful - silky and sexy and comfortable”, we could break this up into: 1-grams: Absolutely, wonderful, silky, and, sexy, and, comfortable.
The development version of ngram
has a get.phrasetable
method:
devtools::install_github("wrathematics/ngram")
library(ngram)
text <- "good qualiti dog food bought sever vital can dog food product found good qualiti product look like stew process meat smell better labrador finicki appreci product better"
ng <- ngram(text)
head(get.phrasetable(ng))
# ngrams freq prop
# 1 good qualiti 2 0.07692308
# 2 dog food 2 0.07692308
# 3 appreci product 1 0.03846154
# 4 process meat 1 0.03846154
# 5 food product 1 0.03846154
# 6 food bought 1 0.03846154
In addition, you can use the print()
method and specify output == "full"
. That is:
print(ng, output = "full")
# NOTE: more output not shown...
better labrador | 1
finicki {1} |
dog food | 2
product {1} | bought {1}
# NOTE: more output not shown...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With