I was wondering if there is any chance of R's text mining package having the following feature:
myCorpus <- Corpus(DirSource(<directory-contatining-textfiles>),control=...)
# add docs
myCorpus.addDocs(DirSource(<new-dir>),control=...)
Ideally I would like to incorporate additional documents into the existing corpus.
Any help is appreciated
temis package in R provides a graphical integrated text-mining solution. This package can be leveraged for many text-mining tasks, such as importing and cleaning a corpus, terms and documents count, term co-occurrences, correspondence analysis, and so on.
The tm package utilizes the Corpus as its main structure. A corpus is simply a collection of documents, but like most things in R , the corpus has specific attributes that enable certain types of analysis.
A text corpus is a large and unstructured set of texts (nowadays usually electronically stored and processed) used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
To perform text mining in R, there is a useful package called 'tm' which provides several functions for text handling, processing and management. The package uses the concept of a 'corpus' which is a collection of text documents to operate upon.
You should be able just to use c(,)
as in
> library(tm)
> data("acq")
> data("crude")
> together <- c(acq,crude)
> acq
A corpus with 50 text documents
> crude
A corpus with 20 text documents
> together
A corpus with 70 text documents
You can find more in the tm package documentation under tm_combine
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With