Reduce PDF file size of plots by filtering hidden objects

Question

While producing scatter plots of many points in R (using ggplot() for example), there might be many points that are behind the others and not visible at all. For instance see the plot below:

enter image description here

This is a scatter plot of several hundreds of thousands points, but most of them are behind the other points. The problem is when casting the output to a vector file (a PDF file for example), the invisible points make the file size so big, and increase memory and cpu usage while viewing the file.

A simple solution is to cast the output to a bitmap picture (TIFF or PNG for example), but they lose the vector quality and can be even larger in size. I tried some online PDF compressors, but the result was the same size as my original file.

Is there any good solution? For example some way to filter the points that are not visible, possibly during generating plot or after it by editing PDF file?

Roland · Accepted Answer

As a start you can do something like this:

set.seed(42)
DF <- data.frame(x=x<-runif(1e6),y=x+rnorm(1e6,sd=0.1))
plot(y~x,data=DF,pch=".",cex=4)

enter image description here

PDF size: 6334 KB

DF2 <- data.frame(x=round(DF$x,3),y=round(DF$y,3))
DF2 <- DF[!duplicated(DF2),]
nrow(DF2)
#[1] 373429
plot(y~x,data=DF2,pch=".",cex=4)

enter image description here

PDF size: 2373 KB

With the rounding you can control how many values you want to remove. You only need to modify this to handle the different colours.

Paul Hiemstra · Answer

Simply saving the plot as a high-res png file will very drastically cut the size, while keeping the quality more than good enough. At least I've never had journals complain about any of the png's I sent them, just keep sure to use > 600 dpi.

Reduce PDF file size of plots by filtering hidden objects

Tags:

plot

r

pdf

vector

ggplot2

Ali

2 Answers

Roland

Paul Hiemstra

Recent Activity

Donate For Us

Reduce PDF file size of plots by filtering hidden objects

Tags:

plot

r

pdf

vector

ggplot2

Ali

2 Answers

Roland

Paul Hiemstra

Related questions

Recent Activity

Donate For Us