What algorithm I need to find n-grams?

2 Answers

If you want to use R to identify ngrams, you can use the tm package and the RWeka package. It will tell you how many times the ngram occurs in your documents, like so:

  library("RWeka")
  library("tm")

  data("crude")

  BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
  tdm <- TermDocumentMatrix(crude, control = list(tokenize = BigramTokenizer))

  inspect(tdm[340:345,1:10])

A term-document matrix (6 terms, 10 documents)

Non-/sparse entries: 4/56
Sparsity           : 93%
Maximal term length: 13 
Weighting          : term frequency (tf)

               Docs
Terms           127 144 191 194 211 236 237 242 246 248
  and said        0   0   0   0   0   0   0   0   0   0
  and security    0   0   0   0   0   0   0   0   1   0
  and set         0   1   0   0   0   0   0   0   0   0
  and six-month   0   0   0   0   0   0   0   1   0   0
  and some        0   0   0   0   0   0   0   0   0   0
  and stabilise   0   0   0   0   0   0   0   0   0   1

hat-tip: http://tm.r-forge.r-project.org/faq.html

answered Oct 04 '22 09:10

Ben

For anyone still interested in this topic, there is a package on the cran already.

ngram: An n-gram Babbler

This package offers utilities for creating, displaying, and "babbling" n-grams. The babbler is a simple Markov process.

http://cran.r-project.org/web/packages/ngram/index.html

answered Oct 04 '22 09:10

IceBruce

Related questions
                            
                                Using custom OTF fonts in ggplot2
                            
                                Convenient way to access variables label after importing Stata data with haven
                            
                                In R plotly subplot graph, how to show only one legend?
                            
                                R: combine several gsub() function in a pipe
                            
                                if_else() `false` must be type double, not integer - in R
                            
                                How to sort data by column in descending order in R
                            
                                max and min functions that are similar to colMeans
                            
                                Barplot with 2 variables side by side
                            
                                ggplot set scale_color_gradientn manually
                            
                                Add a popup with error, warning to shiny
                            
                                R shiny conditionalPanel output value
                            
                                Load a small random sample from a large csv file into R data frame
                            
                                Convert Factor to Date/Time in R
                            
                                Is it possible to push/pull variables between two instances of R?
                            
                                Extract last non-missing value in row with data.table
                            
                                R Plotly Deselect trace by default
                            
                                How to find the three closest (nearest) values within a vector?
                            
                                Saving a data frame as a binary file
                            
                                How to change points and add a regression to a cloudplot (using R)?
                            
                                ggplot2 offset scatterplot points

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What algorithm I need to find n-grams?

Tags:

r

n-gram

Renato Dinhani

People also ask

2 Answers

Ben

IceBruce

Recent Activity

Donate For Us