I type a report with Rmarkdown in Rstudio. When converting it in <code>html</code> with knitr, there is also a <code>markdown</code> file produced by knitr. I convert this file with <code>pandoc</code> as follows : <pre class="prettyprint"><code>pandoc -f markdown -t docx input.md -o output.docx </code></pre> The <code>output.docx</code> file is nice except for one problem: the sizes of the figures are altered, I need to manually resize the figures in Word. Is there something to do, maybe an option with <code>pandoc</code>, to get the right figures sizes ?

Here is my solution: hack the docx converted by Pandoc, as docx is simply a bundle of xml files and adjusting the figure sizes is pretty straightforward. The following is what a figure looks like in the <code>word/document.xml</code> extracted from a converted docx: <pre class="prettyprint"><code><w:p> <w:r> <w:drawing> <wp:inline> <wp:extent cx="1524000" cy="1524000" /> ... <a:graphic> <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:pic> ... <pic:blipFill> <a:blip r:embed="rId23" /> ... </pic:blipFill> <pic:spPr bwMode="auto"> <a:xfrm> <a:off x="0" y="0" /> <a:ext cx="1524000" cy="1524000" /> </a:xfrm> ... </pic:spPr> </pic:pic> </a:graphicData> </a:graphic> </wp:inline> </w:drawing> </w:r> </w:p> </code></pre> So substituting the <code>cx</code> & <code>cy</code> attributes of the nodes <code>wp:extent</code> & <code>a:ext</code> with desired value would do the resizing job. The following R code works for me. The widest figure would take up a whole line's width specified by the variable <code>out.width</code>, and the rest are proportionally resized. <pre class="prettyprint"><code>require(XML) ## default linewidth (inch) for Word 2003 out.width <- 5.77 docx.file <- "report.docx" ## unzip the docx converted by Pandoc system(paste("unzip", docx.file, "-d temp_dir")) document.xml <- "temp_dir/word/document.xml" doc <- xmlParse(document.xml) wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent") a.blip <- getNodeSet(xmlRoot(doc), "//a:blip") a.ext <- getNodeSet(xmlRoot(doc), "//a:ext") figid <- sapply(a.blip, xmlGetAttr, "r:embed") figname <- dir("temp_dir/word/media/") stopifnot(length(figid) == length(figname)) pdffig <- paste("temp_dir/word/media/", ## in case figure ids in docx are not in dir'ed order sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="") ## get dimension info of included pdf figures pdfsize <- do.call(rbind, lapply(pdffig, function (x) { fig.ext <- substr(x, nchar(x) - 2, nchar(x)) pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" ")) pdfinfo <- readLines(pp); close(pp) sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T))) as.numeric(strsplit(sizestr, split=" x ")[[1]]) })) ## resizing pdf figures in xml DOM, with the widest figure taking up a line's width wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1])) wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1]) wp.cx <- as.character(wp.cx) wp.cy <- as.character(wp.cy) sapply(1:length(wp.extent), function (i) xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i])); sapply(1:length(a.ext), function (i) xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i])); ## save hacked xml back to docx saveXML(doc, document.xml, indent = F) setwd("temp_dir") system(paste("zip -r ../", docx.file, " *", sep="")) setwd("..") system("rm -fr temp_dir") </code></pre>

Figure sizes with pandoc conversion from markdown to docx

Tags:

markdown

image

knitr

pandoc

docx

I type a report with Rmarkdown in Rstudio. When converting it in html with knitr, there is also a markdown file produced by knitr. I convert this file with pandoc as follows :

pandoc -f markdown -t docx input.md -o output.docx

The output.docx file is nice except for one problem: the sizes of the figures are altered, I need to manually resize the figures in Word. Is there something to do, maybe an option with pandoc, to get the right figures sizes ?

463

asked Feb 12 '13 09:02

Stéphane Laurent

3 Answers

Here is a solution to resize the figures using ImageMagick from an R Script. The 70% ratio seems to be a nice choice.

# the path containing the Rmd file :
wd <- "..."
setwd(wd)

# the folder containing the figures :
fig.path <- paste0(wd, "/figure")
# all png figures :
figures <- list.files(fig.path, pattern=".png", all.files=TRUE)

# (safety) create copies of the original files
dir.create(paste0(fig.path,"_copy"))
for(i in 1:length(figures)){
  fig <- paste0(fig.path, "/", figures[i])
  file.copy(fig,"figure_copy")
}

# resize all figures
for(i in 1:length(figures)){
    fig <- paste0(fig.path, "/", figures[i])
    comm <- paste("convert -resize 70%", fig, fig)
    shell(comm)
}

# then run pandoc from a command line  
# or from the pandoc() function :
library(knitr)
pandoc("MyReport.md", "docx")

More info about the resize function of ImageMagick : www.perturb.org

165

answered Oct 05 '22 01:10

Stéphane Laurent

I also want to transform an R markdown into both an html and a .docx/.odt with figures at the good size and resolution. Until now, I found that the best way to do this is define explicitly the resolution and size of the graphs in the .md document (dpi, fig.width and fig.height options). If you do this you have good graphs usable for publication and the odt/docx is ok. The problem if you use dpi much higher than the default 72 dpi, is that the graphs will look too big in the html file. Here are 3 approaches I have used to handle this (NB I use R scripts with spin() syntax):

1) use out.extra ='WIDTH="75%"' in knitr options. This will force all graphs of the html to occupy 75% of the window width. This is a quick solution but not optimal if you have plots with very different sizes. (NB I prefer working with centimetres rather than inches, hence the /2.54 everywhere)

library(knitr)
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
               fig.width = 8/2.54, fig.height = 8/2.54,
               out.extra ='WIDTH="75%"'
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

2) use out.width and out.height to specify the size of the graphs in pixels into the html file. I use a constant "sc" to scale down the size of the plot into the html output. This is the more precise approach but the problem is that for each graph you have to define both fig.witdth/height and out.width/height and this is really boaring ! Ideally you should be able to specify in the global options that e.g. out.width = 150*fig.width (where fig.width changes from chunk to chunk). Maybe something like that is possible but I don't know how.

#+ echo = FALSE
library(knitr)
sc <- 150
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
                fig.width = 8/2.54, fig.height = 8/2.54,
                out.width = sc*8/2.54, out.height = sc*8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54, out.width= sc * 14/2.54, out.height= sc * 10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

Note that for these two solution, I think that you can't transform directly your md file into odt with pandoc (the figures are not included). I transform the md into html and then the html into odt (didn't tried for docx). Something like that (if the previous R scripts is names "figsize1.R") :

library(knitr)
setwd("/home/gilles/")
spin("figsize1.R")

system("pandoc figsize1.md -o figsize1.html")
system("pandoc figsize1.html -o figsize1.odt")

3) Simply compile your document twice, once with low dpi value (~96) for the html output and once with high resolution (~300) for the odt/docx output. This is my preferred way now. The main disadvantage is that you must compile twice but this is not reallya problem to me since I generally need the odt file only at the very end of the job to provide to end users. I compile regularly the html during the work with the html notebook button in Rstudio.

#+ echo = FALSE
library(knitr)

opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), 
               fig.width = 8/2.54, fig.height = 8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

Then compile the 2 outputs with the following script (NB here you can directly transform the md file into html):

library(knitr)
setwd("/home/gilles")

opts_chunk$set(dpi=96)
spin("figsize3.R", knit=FALSE)
knit2html("figsize3.Rmd")

opts_chunk$set(dpi=400)
spin("figsize3.R")
system("pandoc figsize3.md -o figsize3.odt")

answered Oct 05 '22 03:10

Gilles

Here is my solution: hack the docx converted by Pandoc, as docx is simply a bundle of xml files and adjusting the figure sizes is pretty straightforward.

The following is what a figure looks like in the word/document.xml extracted from a converted docx:

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="1524000" cy="1524000" />
        ...
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              ...
              <pic:blipFill>
                <a:blip r:embed="rId23" />
                ...
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0" />
                  <a:ext cx="1524000" cy="1524000" />
                </a:xfrm>
                ...
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

So substituting the cx & cy attributes of the nodes wp:extent & a:ext with desired value would do the resizing job. The following R code works for me. The widest figure would take up a whole line's width specified by the variable out.width, and the rest are proportionally resized.

require(XML)

## default linewidth (inch) for Word 2003
out.width <- 5.77
docx.file <- "report.docx"

## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent")
a.blip <- getNodeSet(xmlRoot(doc), "//a:blip")
a.ext <- getNodeSet(xmlRoot(doc), "//a:ext")

figid <- sapply(a.blip, xmlGetAttr, "r:embed")
figname <- dir("temp_dir/word/media/")
stopifnot(length(figid) == length(figname))
pdffig <- paste("temp_dir/word/media/",
                ## in case figure ids in docx are not in dir'ed order
                sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="")

## get dimension info of included pdf figures
pdfsize <- do.call(rbind, lapply(pdffig, function (x) {
    fig.ext <- substr(x, nchar(x) - 2, nchar(x))
    pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" "))
    pdfinfo <- readLines(pp); close(pp)
    sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T)))
    as.numeric(strsplit(sizestr, split=" x ")[[1]])
}))

## resizing pdf figures in xml DOM, with the widest figure taking up a line's width
wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1]))
wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1])
wp.cx <- as.character(wp.cx)
wp.cy <- as.character(wp.cy)
sapply(1:length(wp.extent), function (i)
       xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));
sapply(1:length(a.ext), function (i)
       xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));

## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")

answered Oct 05 '22 03:10

lcn

Related questions
                            
                                simple question on html img tag
                            
                                CSS - simple two columns
                            
                                OpenCV: cvLoadImage opens 16-bit image as 8-bit
                            
                                How to check if an image is a scaled version of another image
                            
                                How to store image historgam into database and be able to perform search
                            
                                want to add image left corner of div
                            
                                ocr and image preprocessing techniques
                            
                                How can one paste images into a Web App? What solutions are available? HTML 5 Canvas?
                            
                                iOS Objective-C Image file name/path different behavior betwewen simulator and device
                            
                                How can I store and retrieve an image using an SQLite database and a WPF application?
                            
                                How to recreate an image preview from outside websites?
                            
                                Image effects with core graphics
                            
                                Android: Setting padding in a overlay image background
                            
                                Fast OCR in vb.net [closed]
                            
                                Jquery crossfade without a plugin
                            
                                How can I set an image background to repeat?
                            
                                Resize images with javascript before assign them to an img tag
                            
                                In Python, how do I easily generate an image file from some source data?
                            
                                Blob object to base64 in JavaScript
                            
                                Transparent images with C# WinForms

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Figure sizes with pandoc conversion from markdown to docx

Tags:

markdown

image

knitr

pandoc

docx

Stéphane Laurent

People also ask

3 Answers

Stéphane Laurent

Gilles

lcn

Recent Activity

Donate For Us