Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: pass multiple arguments to accumulate/reduce

This is related to R: use the newly generated data in the previous row

I realized the actual problem I was faced with is a bit more complicated than the example I gave in the thread above - it seems I have to pass 3 arguments to the recursive calculation to achieve what I want. Thus, accumulate2 or reduce may not work. So I open a new question here to avoid possible confusion.

I have the following dataset grouped by ID:

ID <- c(1, 2, 2, 3, 3, 3)
pw <- c(1:6)
add <- c(1, 2, 3, 5, 7, 8)
x <- c(1, 2, NA, 4, NA, NA)
df <- data.frame(ID, pw, add, x)

df
  ID pw add  x
1  1  1   1  1
2  2  2   2  2
3  2  3   3 NA
4  3  4   5  4
5  3  5   7 NA
6  3  6   8 NA

Within each group for column x, I want to keep the value of the first row as it is, while fill in the remaining rows with lagged values raised to the power stored in pw, and add to the exponent the value in add. I want to update the lagged values as I proceed. So I would like to have:

  ID pw add  x
1  1  1   1  1
2  2  2   2  2
3  2  3   3 2^3 + 3
4  3  4   5  4
5  3  5   7 4^5 + 7
6  3  6   8 (4^5 + 7)^6 + 8 

I have to apply this calculation to a large dataset, so it would be perfect if there is a fast way to do this!

like image 391
dianaiii Avatar asked Apr 19 '21 01:04

dianaiii


3 Answers

If we want to use accumulate2, then specify the arguments correctly i.e. it takes two input arguments as 'pw' and 'add' and an initialization argument which would be the first value of 'x'. As it is a grouped by 'ID', do the grouping before we do the accumulate2, extract the lambda default arguments ..1, ..2 and ..3 respectively in that order and create the recursive function based on this

library(dplyr)
library(purrr)
out <- df %>%
   group_by(ID) %>% 
   mutate(x1 = accumulate2(pw[-1], add[-1], ~  ..1^..2 + ..3, 
             .init = first(x)) %>%
                flatten_dbl ) %>%
   ungroup

out$x1
#[1]    1                   2                  11   
#[4]    4                1031 1201024845477409792

With more than 3 arguments, a for loop would be better

# // initialize an empty vector
out <- c()
# // loop over the `unique` ID
for(id in  unique(df$ID)) {
    # // create a temporary subset of data based on that id
    tmp_df <- subset(df, ID == id)
     # // initialize a temporary storage output
     tmp_out <- numeric(nrow(tmp_df))
     # // initialize first value with the first element of x
     tmp_out[1] <- tmp_df$x[1]
    # // if the number of rows is greater than 1
    if(nrow(tmp_df) > 1) {
       // loop over the rows
      for(i in 2:nrow(tmp_df)) {
        #// do the recursive calculation and update
        tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i]
        }
      } 
     
     out <- c(out, tmp_out)

}

out
#[1] 1                   2                  11     
#[4] 4                1031 1201024845477409792
like image 158
akrun Avatar answered Sep 18 '22 17:09

akrun


In base R we could use the following solution for more than two arguments.

  • In this solution I first subset the original data set on ID values
  • Then I chose row id values through seq_len(nrow(tmp))[-1] omitting the first row id since it was provided by init
  • In anonymous function I used in Reduce, b argument represents accumulated/ previous value starting from init and c represents new/current values of our vector which is row numbers
  • So in every iteration our previous value (starting from init) will be raised to the power of new value from pw and will be summed by new value from add
cbind(df[-length(df)], unlist(lapply(unique(df$ID), function(a) {
  tmp <- subset(df, df$ID == a)
  Reduce(function(b, c) {
    b ^ tmp$pw[c] + tmp$add[c]
  }, init = tmp$x[1],
  seq_len(nrow(tmp))[-1], accumulate = TRUE)
}))) |> setNames(c(names(df)))

  ID pw add            x
1  1  1   1 1.000000e+00
2  2  2   2 2.000000e+00
3  2  3   3 1.100000e+01
4  3  4   5 4.000000e+00
5  3  5   7 1.031000e+03
6  3  6   8 1.201025e+18

Data

structure(list(ID = c(1, 2, 2, 3, 3, 3), pw = 1:6, add = c(1, 
2, 3, 5, 7, 8), x = c(1, 2, NA, 4, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L))
like image 23
Anoushiravan R Avatar answered Sep 18 '22 17:09

Anoushiravan R


Base R, not using Reduce() but rather a while() Loop:

# Split-apply-combine while loop: res => data.frame
res <- do.call(rbind, lapply(with(df, split(df, ID)), function(y){
  # While there are any NAs in x: 
      while(any(is.na(y$x))){
        # Store the index of the first NA value: idx => integer scalar
        idx <- with(y, head(which(is.na(x)), 1))
        # Calculate x at that index using the business rule provided: 
        # x => numeric vector
        y$x[idx] <- with(y, x[(idx-1)] ** pw[idx] + add[idx])
      }
  # Explicitly define the return object: y => GlobalEnv
     y
    }
  )
)

OR recursive function:

# Recursive function: estimation_func => function() 
estimation_func <- function(value_vec, exponent_vec, add_vec){
  # Specify the termination condition; when all elements 
  # of value_vec are no longer NA:
  if(all(!(is.na(value_vec)))){
    # Return value_vec: numeric vector => GlobalEnv
    return(value_vec)
  # Otherwise recursively apply the below: 
  }else{
    # Store the index of the first na value: idx => integer vector
    idx <- Position(is.na, value_vec)
    # Calculate the value of the value_vec at that index; 
    # using the provided business logic: value_vec => numeric vector
    value_vec[idx] <- (value_vec[(idx-1)] ** exponent_vec[idx]) + add_vec[idx]
    # Recursively apply function: function => Local Env
    return(estimation_func(value_vec, exponent_vec, add_vec))
  }
}

# Split data.frame into a list on ID; 
# Overwrite x values, applying recursive function;
# Combine list into a data.frame
# res => data.frame
res <- data.frame( 
  do.call(
    rbind, 
    Map(function(y){y$x <- estimation_func(y$x, y$pw, y$add); y}, split(df, df$ID))
  ), row.names = NULL
)
like image 28
hello_friend Avatar answered Sep 21 '22 17:09

hello_friend