Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Knitr providing different results than RStudio


I'm doing some initial text mining using 'tm' and 'RWeka' using Knitr for reproducibility.

I'm trying to obtain a term-document matrix for a corpus based on two text files, and the process has different results when I run the code in RStudio and when I knit it into a HTML file: HTML file

... when I try other document outputs PDF and Word outputs:PDF and Word outputs

agree with RStudio.

And, I need an HTML output....

Any idea of what may be going on?

Here it is the .Rmd code

title: "test"
author: "me"
output: word_document

```{r init, echo=FALSE, warning=FALSE, cache=TRUE, message=FALSE}
options(mc.cores=1) # some problems with parallel processing
```{r 1-gram-test, echo=FALSE, eval=TRUE,cache=TRUE}

doc1 <- c("en un lugar de la mancha de cuyo nombre no quiero acordarme habitaba un hidalgo de los de adarga antigual, rocín flaco y galgo corredor")
doc2 <- c("había una vez un barquito chiquitito, que no sabía, que no sabía, que no sabía navegar... pasaron un dos tres cuatro cinco seis semanas y el barquito navegó.")
docs <- c(doc1, doc2)
es <- Corpus(VectorSource(docs),
         readerControl = list(reader = readPlain,
                              language = "ES-es", load = TRUE))
# convert to plain text
es1 <- tm_map(es, PlainTextDocument)

monogramtok <- function(x) {
    RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 1, max = 1))

es_tdm1 <- TermDocumentMatrix(es1)

esmono_tdm1 <- TermDocumentMatrix(es1, 
                                 control = list(tokenize = monogramtok, 
                                                wordLengths = c(1, Inf))) #,                               




sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.4 (El Capitan)

locale: [3] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [3] stats graphics grDevices utils datasets methods base

other attached packages: [3] R.utils_2.2.0 R.oo_1.20.0 R.methodsS3_1.7.1 dplyr_0.4.3 xtable_1.8-0
[6] pander_0.6.0 RWeka_0.4-24 SnowballC_0.5.1 tm_0.6-2 NLP_0.1-9
[11] knitr_1.12.3

like image 909
ines vidal Avatar asked Mar 30 '16 02:03

ines vidal

People also ask

Is R Markdown the same as R?

Technically, R Markdown is a file, whereas R Notebook is a way to work with R Markdown files. R Notebooks do not have their own file format, they all use . Rmd .

How do you use the knitr in RStudio?

If you are using RStudio, then the “Knit” button (Ctrl+Shift+K) will render the document and display a preview of it.

Can I convert R script to R Markdown?

In fact, you can take any R script and compile it into a report that includes commentary, source code, and script output. Reports can be compiled to any output format including HTML, PDF, MS Word, and Markdown.

What does knitr :: Opts_chunk set echo true mean?

The first code chunk: ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` is used to specify any global settings to be applied to the R Markdown script. The example sets all code chunks as “echo=TRUE”, meaning they will be included in the final rendered version.

1 Answers

I had a similar problem, then realized I was caching my knitr chunks with the option cache=TRUE (as you seem to have set as well).

This can cause some really subtle errors if the cached chunks have side effects or depend on external resources.

When I disabled caching, my reproducibility issues disappeared.

like image 85
jayelm Avatar answered Nov 15 '22 06:11
