Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a vectorized parallel max() and min()?

I have a data.frame with columns "a" and "b". I want to add columns called "high" and "low" that contain the highest and the lowest among columns a and b.

Is there a way of doing this without looping over the lines in the dataframe?

edit: this is for OHLC data, and so the high and low column should contain the highest and lowest element between a and b on the same line, and not among the whole columns. sorry if this is badly worded.

like image 878
Generic Person Avatar asked Apr 08 '11 07:04

Generic Person


People also ask

What is pmin and pmax?

The R pmax function returns the parallel maxima of two or more input vectors. The R pmin function returns the parallel minima of two or more input vectors.

What is parallel maximum?

In short, a parameterized high performance library for computing maximum cliques in large sparse graphs. Finding maximum cliques, k-cliques, and temporal strong components are in general NP-hard. Yet, these can be computed fast in most social and information networks.

What is the min function in R?

which. min() function in R Language is used to return the location of the first minimum value in the Numeric Vector.


2 Answers

Sounds like you're looking for pmax and pmin ("parallel" max/min):

Extremes                 package:base                  R Documentation  Maxima and Minima  Description:       Returns the (parallel) maxima and minima of the input values.  Usage:       max(..., na.rm = FALSE)      min(..., na.rm = FALSE)       pmax(..., na.rm = FALSE)      pmin(..., na.rm = FALSE)       pmax.int(..., na.rm = FALSE)      pmin.int(..., na.rm = FALSE)  Arguments:       ...: numeric or character arguments (see Note).     na.rm: a logical indicating whether missing values should be           removed.  Details:       ‘pmax’ and ‘pmin’ take one or more vectors (or matrices) as      arguments and return a single vector giving the ‘parallel’ maxima      (or minima) of the vectors.  The first element of the result is      the maximum (minimum) of the first elements of all the arguments,      the second element of the result is the maximum (minimum) of the      second elements of all the arguments and so on.  Shorter inputs      are recycled if necessary.  ‘attributes’ (such as ‘names’ or      ‘dim’) are transferred from the first argument (if applicable). 
like image 73
NPE Avatar answered Oct 10 '22 06:10

NPE


Here's a version I implemented using Rcpp. I compared pmin with my version, and my version is roughly 3 times faster.

library(Rcpp)  cppFunction("   NumericVector min_vec(NumericVector vec1, NumericVector vec2) {     int n = vec1.size();     if(n != vec2.size()) return 0;     else {       NumericVector out(n);       for(int i = 0; i < n; i++) {         out[i] = std::min(vec1[i], vec2[i]);       }       return out;     }   } ")  x1 <- rnorm(100000) y1 <- rnorm(100000)  microbenchmark::microbenchmark(min_vec(x1, y1)) microbenchmark::microbenchmark(pmin(x1, y1))  x2 <- rnorm(500000) y2 <- rnorm(500000)  microbenchmark::microbenchmark(min_vec(x2, y2)) microbenchmark::microbenchmark(pmin(x2, y2)) 

The microbenchmark function output for 100,000 elements is:

> microbenchmark::microbenchmark(min_vec(x1, y1)) Unit: microseconds             expr     min       lq     mean  median       uq  min_vec(x1, y1) 215.731 222.3705 230.7018 224.484 228.1115      max neval  284.631   100 > microbenchmark::microbenchmark(pmin(x1, y1)) Unit: microseconds          expr     min       lq     mean  median      uq      max  pmin(x1, y1) 891.486 904.7365 943.5884 922.899 954.873 1098.259  neval    100 

And for 500,000 elements:

> microbenchmark::microbenchmark(min_vec(x2, y2)) Unit: milliseconds             expr      min       lq     mean   median       uq  min_vec(x2, y2) 1.493136 2.008122 2.109541 2.140318 2.300022      max neval  2.97674   100 > microbenchmark::microbenchmark(pmin(x2, y2)) Unit: milliseconds          expr      min       lq     mean   median       uq  pmin(x2, y2) 4.652925 5.146819 5.286951 5.264451 5.445638       max neval  6.639985   100 

So you can see the Rcpp version is faster.

You could make it better by adding some error checking in the function, for instance: check that both vectors are the same length, or that they are comparable (not character vs. numeric, or boolean vs. numeric).

like image 40
Mario Becerra Avatar answered Oct 10 '22 07:10

Mario Becerra