Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R text mining package: Allowing to incorporate new documents into an existing corpus

I was wondering if there is any chance of R's text mining package having the following feature:

myCorpus <- Corpus(DirSource(<directory-contatining-textfiles>),control=...)
# add docs
myCorpus.addDocs(DirSource(<new-dir>),control=...)

Ideally I would like to incorporate additional documents into the existing corpus.

Any help is appreciated

like image 741
Shivani Rao Avatar asked Jul 07 '11 20:07

Shivani Rao


People also ask

What is the package used in R for text mining?

temis package in R provides a graphical integrated text-mining solution. This package can be leveraged for many text-mining tasks, such as importing and cleaning a corpus, terms and documents count, term co-occurrences, correspondence analysis, and so on.

What is the main structure for managing documents in the TM package?

The tm package utilizes the Corpus as its main structure. A corpus is simply a collection of documents, but like most things in R , the corpus has specific attributes that enable certain types of analysis.

What is corpus in text mining?

A text corpus is a large and unstructured set of texts (nowadays usually electronically stored and processed) used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

What is the package TM used for in R?

To perform text mining in R, there is a useful package called 'tm' which provides several functions for text handling, processing and management. The package uses the concept of a 'corpus' which is a collection of text documents to operate upon.


1 Answers

You should be able just to use c(,) as in

> library(tm)
> data("acq")
> data("crude")
> together <- c(acq,crude)
> acq
A corpus with 50 text documents
> crude
A corpus with 20 text documents
> together
A corpus with 70 text documents

You can find more in the tm package documentation under tm_combine.

like image 115
Henry Avatar answered Oct 06 '22 04:10

Henry