Is plot()
the most efficient way to plot 100 million or so data points in R? I'd like to plot a bunch of these Clifford Attractors. Here's an example of one I've downscaled from a very large image:
Here is a link to some code that I've used to plot a very large 8K (7680x4320) images.
It doesn't take long to generate 50 or 100 million points (using Rcpp), nor to get the hex value for the colour + transparency, but the actual plotting and saving to disk is extremely slow.
Edit: code used
# Load packages library(Rcpp) library(viridis) # output parameters output_width = 1920 * 4 output_height = 1080 * 4 N_points = 50e6 point_alpha = 0.05 #point transperancy # Attractor parameters params <- c(1.886,-2.357,-0.328, 0.918) # C++ function to rapidly generate points cliff_rcpp <- cppFunction( " NumericMatrix cliff(int nIter, double A, double B, double C, double D) { NumericMatrix x(nIter, 2); for (int i=1; i < nIter; ++i) { x(i,0) = sin(A*x(i-1,1)) + C*cos(A*x(i-1,0)); x(i,1) = sin(B*x(i-1,0)) + D*cos(B*x(i-1,1)); } return x; }" ) # Function for mapping a point to a colour map2color <- function(x, pal, limits = NULL) { if (is.null(limits)) limits = range(x) pal[findInterval(x, seq(limits[1], limits[2], length.out = length(pal) + 1), all.inside = TRUE)] } # Obtain matrix of points cliff_points <- cliff_rcpp(N_points, params[1], params[2], params[3], params[4]) # Calculate angle between successive points cliff_angle <- atan2( (cliff_points[, 1] - c(cliff_points[-1, 1], 0)), (cliff_points[, 2] - c(cliff_points[-1, 2], 0)) ) # Obtain colours for points available_cols <- viridis( 1024, alpha = point_alpha, begin = 0, end = 1, direction = 1 ) cliff_cols <- map2color( cliff_angle, c(available_cols, rev(available_cols)) ) # Output image directly to disk jpeg( "clifford_attractor.jpg", width = output_width, height = output_height, pointsize = 1, bg = "black", quality = 100 ) plot( cliff_points[-1, ], bg = "black", pch = ".", col = cliff_cols ) dev.off()
As of 2022, the best solution is to use DuckDB (there is an R connector), it allows you to query very large datasets (CSV, parquet, among others), and it comes with many functions to compute summary statistics. The idea is to use DuckDB to compute those statistics, load such statistics into R/Python/Julia, and plot.
Change R base plot point shapes You can change this to pch = 19 (solid circle) or to pch = 21 (filled circle). To change the color and the size of points, use the following arguments: col : color (hexadecimal color code or color name). For example, col = "blue" or col = "#4F6228" .
Plots in R. There are three basic plotting functions in R: high-level plots, low-level plots, and the layout command par. Basically, a high-level plot function creates a complete plot and a low-level plot function adds to an existing plot, that is, one created by a high-level plot command.
Plot character or pch is the standard argument to set the character that will be plotted in a number of R functions. Explanatory text can be added to a plot in several different forms, including axis labels, titles, legends, or a text added to the plot itself.
I've recently discovered the Scattermore package for R which is about an order of magnitude faster than R's standard plot function. scattermoreplot()
takes ~2 minutes to plot 100m points with colour and transparency, while plot()
takes around half an hour.
I am currently exploring datashader (http://www.datashader.org). If you are willing to work with python, this could be an elegant solution to the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With