Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Knitr providing different results than RStudio

Tags:

I'm doing some initial text mining using 'tm' and 'RWeka' using Knitr for reproducibility.

I'm trying to obtain a term-document matrix for a corpus based on two text files, and the process has different results when I run the code in RStudio and when I knit it into a HTML file: HTML file

... when I try other document outputs PDF and Word outputs:PDF and Word outputs

agree with RStudio.

And, I need an HTML output....

Any idea of what may be going on?

Here it is the .Rmd code

---
title: "test"
author: "me"
output: word_document
---

```{r init, echo=FALSE, warning=FALSE, cache=TRUE, message=FALSE}
library(knitr)
library(tm)
library(SnowballC)
library(RWeka)
setwd("~")
options(mc.cores=1) # some problems with parallel processing
```
```{r 1-gram-test, echo=FALSE, eval=TRUE,cache=TRUE}

doc1 <- c("en un lugar de la mancha de cuyo nombre no quiero acordarme habitaba un hidalgo de los de adarga antigual, rocín flaco y galgo corredor")
doc2 <- c("había una vez un barquito chiquitito, que no sabía, que no sabía, que no sabía navegar... pasaron un dos tres cuatro cinco seis semanas y el barquito navegó.")
docs <- c(doc1, doc2)
es <- Corpus(VectorSource(docs),
         readerControl = list(reader = readPlain,
                              language = "ES-es", load = TRUE))
es
# convert to plain text
es1 <- tm_map(es, PlainTextDocument)

monogramtok <- function(x) {
    RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 1, max = 1))
}

es_tdm1 <- TermDocumentMatrix(es1)

esmono_tdm1 <- TermDocumentMatrix(es1, 
                                 control = list(tokenize = monogramtok, 
                                                wordLengths = c(1, Inf))) #,                               

printf("es_tdm1")
es_tdm1

printf("esmono_tdm1")
esmono_tdm1

```

sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.4 (El Capitan)

locale: [3] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [3] stats graphics grDevices utils datasets methods base

other attached packages: [3] R.utils_2.2.0 R.oo_1.20.0 R.methodsS3_1.7.1 dplyr_0.4.3 xtable_1.8-0
[6] pander_0.6.0 RWeka_0.4-24 SnowballC_0.5.1 tm_0.6-2 NLP_0.1-9
[11] knitr_1.12.3

like image 909
ines vidal Avatar asked Mar 30 '16 02:03

ines vidal


People also ask

Is R Markdown the same as R?

Technically, R Markdown is a file, whereas R Notebook is a way to work with R Markdown files. R Notebooks do not have their own file format, they all use . Rmd .

How do you use the knitr in RStudio?

If you are using RStudio, then the “Knit” button (Ctrl+Shift+K) will render the document and display a preview of it.

Can I convert R script to R Markdown?

In fact, you can take any R script and compile it into a report that includes commentary, source code, and script output. Reports can be compiled to any output format including HTML, PDF, MS Word, and Markdown.

What does knitr :: Opts_chunk set echo true mean?

The first code chunk: ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` is used to specify any global settings to be applied to the R Markdown script. The example sets all code chunks as “echo=TRUE”, meaning they will be included in the final rendered version.


1 Answers

I had a similar problem, then realized I was caching my knitr chunks with the option cache=TRUE (as you seem to have set as well).

This can cause some really subtle errors if the cached chunks have side effects or depend on external resources.

When I disabled caching, my reproducibility issues disappeared.

like image 85
jayelm Avatar answered Nov 15 '22 06:11

jayelm