Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize a subsetting function in R?

Tags:

r

dplyr

I've had some luck vectorizing certain functions, which is great for clean code, avoiding loops, and speed.

However, I have not been able to vectorize any function which subsets a dataframe based on the inputs to the function

Example

E.g. This function works well when it receives elements

test_funct <- function(sep_wid, sep_len) {
    iris %>% filter(Sepal.Width > sep_wid & Sepal.Length < sep_len) %>% .$Petal.Width %>% sum
}

test_funct(4, 6)

# [1] 0.7 # This works nicely

But when attempting to provide vectors as inputs to this function:

sep_wid_vector <- c(4, 3.5, 3)
sep_len_vector <- c(6, 6, 6.5)


test_funct(sep_wid_vector, sep_len_vector)

[1] 9.1 

But the desired output is a vector of the same length as the input vectors, as though the function was run on the first elements of each vector, then the second, then the third. i.e.

# 0.7    4.2     28.5 

For convenience, here output as if these were all run separately

test_funct(4, 6) # 0.7
test_funct(3.5, 6) # 4.2
test_funct(3, 6.5) # 28.5

How can I vectorize a function that subsets data based on its inputs so that it can receive vector inputs?

like image 902
stevec Avatar asked Apr 08 '19 08:04

stevec


People also ask

What is vector subsetting in R?

The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.

How do you vectorize a variable in R?

You can create a Vector in R using c() primitive function. In R programming, the Vector contains elements of the same type and the types can be logical, integer, double, character, complex or raw. Besides c() you can also create a vector using vector(), character() functions.

What are the three subsetting operators in R?

There are three subsetting operators, [[ , [ , and $ . Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames). Subsetting can be combined with assignment.


3 Answers

The problem is that filter takes vector inputs, so it will recycle the vectors in the Sepal.width and Sepal.length comparisons.

One way to do this would be to use map2 from the purrr package:

map2_dbl(sep_wid_vector, sep_len_vector, test_funct)

Of course you could then wrap this in a function. You might also want to consider passing in the data frame as a function parameter.

like image 80
Tarquinnn Avatar answered Oct 22 '22 18:10

Tarquinnn


You can use Vectorize:

tv <- Vectorize(test_funct)

tv(sep_wid_vector, sep_len_vector)
# [1]  0.7  4.2 28.5

This is basically a wrapper around mapply. Be aware that under the hood you are running an *apply function, which is alos sort of a loop

like image 5
thothal Avatar answered Oct 22 '22 17:10

thothal


Here is one way using sapply

# function using sapply
test_funct <- function(sep_wid, sep_len) {
  sapply(seq_along(sep_wid), function(x) {
    sum(iris$Petal.Width[iris$Sepal.Width > sep_wid[x] & iris$Sepal.Length < sep_len[x]])
  })
}

# testing with single value
test_funct(4,6)
[1] 0.7

# testing with vectors
test_funct(sep_wid_vector, sep_len_vector)
[1]  0.7  4.2 28.5
like image 2
cropgen Avatar answered Oct 22 '22 17:10

cropgen