Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text summarization in R language

I have long text file using help of R language I want to summarize text in at least 10 to 20 line or in small sentences. How to summarize text in at least 10 line with R language ?

like image 989
Mehul Katara Avatar asked Oct 27 '25 00:10

Mehul Katara


2 Answers

You may try this (from the LSAfun package):

genericSummary(D,k=1)

whereby 'D' specifies your text document and 'k' the number of sentences to be used in the summary. (Further modifications are shown in the package documentation).

For more information: http://search.r-project.org/library/LSAfun/html/genericSummary.html

like image 53
Daniel Avatar answered Oct 28 '25 14:10

Daniel


There's a package called lexRankr that summarizes text in the same way that Reddit's /u/autotldr bot summarizes articles. This article has a full walkthrough on how to use it but just as a quick example so you can test it yourself in R:

#load needed packages
library(xml2)
library(rvest)
library(lexRankr)

#url to scrape
monsanto_url = "https://www.theguardian.com/environment/2017/sep/28/monsanto-banned-from-european-parliament"

#read page html
page = xml2::read_html(monsanto_url)
#extract text from page html using selector
page_text = rvest::html_text(rvest::html_nodes(page, ".js-article__body p"))

#perform lexrank for top 3 sentences
top_3 = lexRankr::lexRank(page_text,
                          #only 1 article; repeat same docid for all of input vector
                          docId = rep(1, length(page_text)),
                          #return 3 sentences to mimick /u/autotldr's output
                          n = 3,
                          continuous = TRUE)

#reorder the top 3 sentences to be in order of appearance in article
order_of_appearance = order(as.integer(gsub("_","",top_3$sentenceId)))
#extract sentences in order of appearance
ordered_top_3 = top_3[order_of_appearance, "sentence"]

> ordered_top_3
[1] "Monsanto lobbyists have been banned from entering the European parliament after the multinational refused to attend a parliamentary hearing into allegations of regulatory interference."
[2] "Monsanto officials will now be unable to meet MEPs, attend committee meetings or use digital resources on parliament premises in Brussels or Strasbourg."                                
[3] "A Monsanto letter to MEPs seen by the Guardian said that the European parliament was not “an appropriate forum” for discussion on the issues involved."  
like image 20
RDRR Avatar answered Oct 28 '25 13:10

RDRR