Why does the ngrams() function give distinct bigrams?

Tags:

I am writing an R script and am using library(ngram).

Suppose I have a string,

"good qualiti dog food bought sever vital can dog food product found good qualiti product look like stew process meat smell better labrador finicki appreci product better"

and want to find bi-grams.

The ngram library is giving me bi-grams as follows:

"appreci product" "process meat" "food product" "food bought" "qualiti dog" "product found" "product look" "look like" "like stew" "good qualiti" "labrador finicki" "bought sever" "qualiti product" "better labrador" "dog food" "smell better" "vital can" "meat smell" "found good" "sever vital" "stew process" "can dog" "finicki appreci" "product better"

As the sentence contains "dog food" two times, I want this bi-gram two times. But I am getting it once!

Is there an option in thengram library or any other library that gives all the bi-grams of my sentence in R?

941

asked Sep 29 '15 17:09

KrunalParmar

1 Answers

The development version of ngram has a get.phrasetable method:

devtools::install_github("wrathematics/ngram")
library(ngram)

text <- "good qualiti dog food bought sever vital can dog food product found good qualiti product look like stew process meat smell better labrador finicki appreci product better"

ng <- ngram(text)
head(get.phrasetable(ng))
#            ngrams freq       prop
# 1    good qualiti    2 0.07692308
# 2        dog food    2 0.07692308
# 3 appreci product    1 0.03846154
# 4    process meat    1 0.03846154
# 5    food product    1 0.03846154
# 6     food bought    1 0.03846154

In addition, you can use the print() method and specify output == "full". That is:

print(ng, output = "full")

# NOTE: more output not shown...
better labrador | 1 
finicki {1} | 

dog food | 2 
product {1} | bought {1} 
# NOTE: more output not shown...

answered Sep 28 '22 03:09

JasonAizkalns

Related questions
                            
                                Can Rcpp package DLLs be unloaded without restarting R?
                            
                                How to write rasters after stacking them?
                            
                                Why does R find a data.frame variable that isn't in the data.frame?
                            
                                Convert to the day and time of the year in R
                            
                                Elegant way to vectorize seq?
                            
                                How do I extract the Correlation of fixed effects part of the lmer output
                            
                                Split a vector into three vectors of unequal length in R
                            
                                How to trim and replace a string
                            
                                In R, how to replace values in multiple columns with a vector of values equal to the same width?
                            
                                given value of matrix, getting it's coordinate
                            
                                How to arrange column in heatmap.2() based on a predefined order
                            
                                Different results with formula and non-formula for caret training
                            
                                Split a column by group [duplicate]
                            
                                Count common words in two strings
                            
                                How to melt R data.frame and plot group by bar plot
                            
                                R: How to find non-sequential elements in an array
                            
                                convert dplyr join syntax into pure data.table syntax
                            
                                Using gsub adding new column in a data.table
                            
                                Creating new shape palettes in ggplot2 and other R graphics
                            
                                How to enforce stack ordering in ggplot geom_area

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the ngrams() function give distinct bigrams?

Tags:

r

nlp

n-gram

KrunalParmar

People also ask

1 Answers

JasonAizkalns

Recent Activity

Donate For Us