Rather than ask how to plot big data sets, I want to wrap <code>plot</code> so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. How can I wrap <code>plot</code> with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large? This code works for very vanilla calls to <code>plot</code>, but it's missing the same generality as <code>plot</code> (see below). <pre class="prettyprint"><code>myPlot <- function(x, ...){ isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) ) if(is.na(isBad)){isBad = FALSE} if(isBad){ stop("No plots for you!") } return(plot(x, ...)) } x = rnorm(1000) x = rnorm(10^6 + 1) myPlot(x) </code></pre> An example where this fails: <pre class="prettyprint"><code>x = rnorm(1000) y = rnorm(1000) plot(y ~ x) myPlot(y ~ x) </code></pre> Is there some easy way to wrap <code>plot</code> to enable this checking of the data to be plotted, while still passing through all of the arguments? If not, then how about <code>ggplot2</code>? I'm an equal opportunity non-plotter. (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.) <hr> Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, e.g. <code>myThreshold <- 1000</code>), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. :)

The problem you have is that as currently coded, <code>myplot()</code> assumes <code>x</code> is a data object, but then you try to pass it a formula. R's <code>plot()</code> achieves this via methods - when <code>x</code> is a formula, the <code>plot.formula()</code> method gets dispatched to instead of the basic <code>plot.default()</code> method. You need to do the same: <pre class="prettyprint"><code>myplot <- function(x, ...) UseMethod("myplot") myplot.default <- function(x, ....) { isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6)) if(is.na(isBad)){isBad = FALSE} if(isBad){ stop("No plots for you!") } invisible(plot(x, ...)) } myplot.formula <- function(x, ...) { ## code here to process the formula into a data object for plotting .... myplot.default(processed_x, ...) } </code></pre> You can steal code from <code>plot.formula()</code> to use in the code needed to process <code>x</code> into an object. Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF).

Wrapping R's plot function (or ggplot2) to prevent plotting of large data sets

Q: How do you plot a function in R?

The plot() function in R isn't a single defined function but a placeholder for a family of related functions. The exact function being called will depend upon the parameters used. At its simplest, plot() function simply plots two vectors against each other. This gives a simple plot for y = x^2.

Tags:

plot

r

ggplot2

bigdata

Rather than ask how to plot big data sets, I want to wrap plot so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. How can I wrap plot with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large?

This code works for very vanilla calls to plot, but it's missing the same generality as plot (see below).

myPlot <- function(x, ...){
    isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) )
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    return(plot(x, ...))
}

x = rnorm(1000)
x = rnorm(10^6 + 1)

myPlot(x)

An example where this fails:

x = rnorm(1000)
y = rnorm(1000)
plot(y ~ x)
myPlot(y ~ x)

Is there some easy way to wrap plot to enable this checking of the data to be plotted, while still passing through all of the arguments? If not, then how about ggplot2? I'm an equal opportunity non-plotter. (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.)

Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, e.g. myThreshold <- 1000), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. :)

350

asked Oct 15 '11 17:10

Iterator

1 Answers

The problem you have is that as currently coded, myplot() assumes x is a data object, but then you try to pass it a formula. R's plot() achieves this via methods - when x is a formula, the plot.formula() method gets dispatched to instead of the basic plot.default() method.

You need to do the same:

myplot <- function(x, ...)
    UseMethod("myplot")

myplot.default <- function(x, ....) {
    isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) || 
                    (nrow(x) > 10^6))
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    invisible(plot(x, ...))
}

myplot.formula <- function(x, ...) {
    ## code here to process the formula into a data object for plotting
    ....
    myplot.default(processed_x, ...)
}

You can steal code from plot.formula() to use in the code needed to process x into an object. Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF).

200

answered Sep 29 '22 23:09

Gavin Simpson

Related questions
                            
                                R CMD SHLIB Fortran 90 file which use NetCDF
                            
                                Comparing Plumbr to other options for making a chart with R in a Python script
                            
                                Pass expressions to function to evaluate within data.table to allow for internal optimisation
                            
                                Rearranging a 'ggplot2' Legend to Mix and Match Different Levels
                            
                                Fast parallel bipartite distance calculation in R
                            
                                Hide Keys in Shiny Application Deploy
                            
                                How to call Python function from R reticulate in Rmarkdown
                            
                                Curve fitting in R using nls
                            
                                What is a simple way to combine two Emacs major modes, or to change an existing mode?
                            
                                How to set the classpath for rJava in R?
                            
                                Is there a way to read and write in-memory files in R?
                            
                                Call R plots from c++ using RInside/ Rcpp
                            
                                Do table slices take up memory in R?
                            
                                Using "[[ ]]" notation for reference class methods
                            
                                Small-caps in R legend?
                            
                                Create a variable capturing the most frequent occurence by group
                            
                                load does not work with foreach and %dopar%
                            
                                How to remove training data from party:::ctree models?
                            
                                Why does Sweave throw an error on LaTeX code that has been commented out?
                            
                                Data dictionary packing in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With