Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently plotting hundreds of millions of points in R

Is plot() the most efficient way to plot 100 million or so data points in R? I'd like to plot a bunch of these Clifford Attractors. Here's an example of one I've downscaled from a very large image:

A Clifford attractor

Here is a link to some code that I've used to plot a very large 8K (7680x4320) images.

It doesn't take long to generate 50 or 100 million points (using Rcpp), nor to get the hex value for the colour + transparency, but the actual plotting and saving to disk is extremely slow.

  • Is there a faster way to plot (and save) all these points?
  • Is R just a bad tool for this job?
  • What tools would you use to plot billions points, even if you couldn't fit them all in to ram?
  • How would one have made a very high resolution plot of this type (colour + transparency) with say 1990's software and hardware?

Edit: code used

# Load packages library(Rcpp) library(viridis)  # output parameters output_width = 1920 * 4 output_height = 1080 * 4 N_points = 50e6 point_alpha = 0.05 #point transperancy  # Attractor parameters params <- c(1.886,-2.357,-0.328, 0.918)  # C++ function to rapidly generate points cliff_rcpp <- cppFunction(     "     NumericMatrix cliff(int nIter, double A, double B, double C, double D) {     NumericMatrix x(nIter, 2);     for (int i=1; i < nIter; ++i) {     x(i,0) = sin(A*x(i-1,1)) + C*cos(A*x(i-1,0));     x(i,1) = sin(B*x(i-1,0)) + D*cos(B*x(i-1,1));     }     return x;     }" )  # Function for mapping a point to a colour map2color <- function(x, pal, limits = NULL) {     if (is.null(limits))         limits = range(x)     pal[findInterval(x,                      seq(limits[1], limits[2], length.out = length(pal) + 1),                      all.inside = TRUE)] }  # Obtain matrix of points cliff_points <- cliff_rcpp(N_points, params[1], params[2], params[3], params[4])  # Calculate angle between successive points cliff_angle <- atan2(     (cliff_points[, 1] - c(cliff_points[-1, 1], 0)),     (cliff_points[, 2] - c(cliff_points[-1, 2], 0)) )  # Obtain colours for points available_cols <-     viridis(         1024,         alpha = point_alpha,         begin = 0,         end = 1,         direction = 1     )  cliff_cols <- map2color(     cliff_angle,     c(available_cols, rev(available_cols)) )   # Output image directly to disk jpeg(     "clifford_attractor.jpg",     width = output_width,     height = output_height,     pointsize = 1,     bg = "black",     quality = 100  )     plot(         cliff_points[-1, ],         bg = "black",         pch = ".",         col = cliff_cols     )  dev.off() 
like image 411
dcl Avatar asked Jul 01 '18 11:07

dcl


People also ask

How do I plot a large data in R?

As of 2022, the best solution is to use DuckDB (there is an R connector), it allows you to query very large datasets (CSV, parquet, among others), and it comes with many functions to compute summary statistics. The idea is to use DuckDB to compute those statistics, load such statistics into R/Python/Julia, and plot.

How do I increase PCH size in R?

Change R base plot point shapes You can change this to pch = 19 (solid circle) or to pch = 21 (filled circle). To change the color and the size of points, use the following arguments: col : color (hexadecimal color code or color name). For example, col = "blue" or col = "#4F6228" .

What are high level plotting functions in R?

Plots in R. There are three basic plotting functions in R: high-level plots, low-level plots, and the layout command par. Basically, a high-level plot function creates a complete plot and a low-level plot function adds to an existing plot, that is, one created by a high-level plot command.

What is use of PCH argument while plotting in R?

Plot character or pch is the standard argument to set the character that will be plotted in a number of R functions. Explanatory text can be added to a plot in several different forms, including axis labels, titles, legends, or a text added to the plot itself.


2 Answers

I've recently discovered the Scattermore package for R which is about an order of magnitude faster than R's standard plot function. scattermoreplot() takes ~2 minutes to plot 100m points with colour and transparency, while plot() takes around half an hour.

like image 74
dcl Avatar answered Oct 03 '22 14:10

dcl


I am currently exploring datashader (http://www.datashader.org). If you are willing to work with python, this could be an elegant solution to the problem.

like image 22
Nairolf Avatar answered Oct 03 '22 12:10

Nairolf