Rather than ask how to plot big data sets, I want to wrap plot
so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. How can I wrap plot
with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large?
This code works for very vanilla calls to plot
, but it's missing the same generality as plot
(see below).
myPlot <- function(x, ...){
isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) )
if(is.na(isBad)){isBad = FALSE}
if(isBad){
stop("No plots for you!")
}
return(plot(x, ...))
}
x = rnorm(1000)
x = rnorm(10^6 + 1)
myPlot(x)
An example where this fails:
x = rnorm(1000)
y = rnorm(1000)
plot(y ~ x)
myPlot(y ~ x)
Is there some easy way to wrap plot
to enable this checking of the data to be plotted, while still passing through all of the arguments? If not, then how about ggplot2
? I'm an equal opportunity non-plotter. (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.)
Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, e.g. myThreshold <- 1000
), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. :)
As of 2022, the best solution is to use DuckDB (there is an R connector), it allows you to query very large datasets (CSV, parquet, among others), and it comes with many functions to compute summary statistics. The idea is to use DuckDB to compute those statistics, load such statistics into R/Python/Julia, and plot.
ggplot2 is a plotting package that provides helpful commands to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties.
In the same way that you can use type = "n" to suppress the points and axes = FALSE to suppress the axes. I was also thinking of lm() , which outputs the results, and lm0 <- lm() , which doesn't. Here, assigning the function to a variable suppresses any output to the GUI.
The plot() function in R isn't a single defined function but a placeholder for a family of related functions. The exact function being called will depend upon the parameters used. At its simplest, plot() function simply plots two vectors against each other. This gives a simple plot for y = x^2.
The problem you have is that as currently coded, myplot()
assumes x
is a data object, but then you try to pass it a formula. R's plot()
achieves this via methods - when x
is a formula, the plot.formula()
method gets dispatched to instead of the basic plot.default()
method.
You need to do the same:
myplot <- function(x, ...)
UseMethod("myplot")
myplot.default <- function(x, ....) {
isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) ||
(nrow(x) > 10^6))
if(is.na(isBad)){isBad = FALSE}
if(isBad){
stop("No plots for you!")
}
invisible(plot(x, ...))
}
myplot.formula <- function(x, ...) {
## code here to process the formula into a data object for plotting
....
myplot.default(processed_x, ...)
}
You can steal code from plot.formula()
to use in the code needed to process x
into an object. Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With