Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping with dplyr on each row of dataframe

I have a dataframe

df <- data.frame(var1=c(10,20,30,40,50), var2=c(rep(0.3,5)), BYGROUP_OBSNUM=c(0:4))

var1 var2 BYGROUP_OBSNUM
10   0.3          0
20   0.3          1
30   0.3          2
40   0.3          3
50   0.3          4

I need to perform calculations for each row using dplyr as my real dataframe is really huge and dplyr is very efficient.

What I want:

var1    var2    BYGROUP_OBSNUM  VAR1_NEW
10       0.3       0              10
20       0.3       1              23
30       0.3       2              36.9
40       0.3       3              51.07
50       0.3       4              65.321

How is this achieved:

var1    var2    BYGROUP_OBSNUM  VAR1_NEW    
10       0.3         0           10
20       0.3         1           20+10*0.3
30       0.3         2           30+20*0.3+10*0.3^2
40       0.3         3           40+30*0.3+20*0.3^2+10*0.3^3
50       0.3         4           50+40*0.3+30*0.3^2+20*0.3^3+10*0.3^4

Therefore for each row the formula is:

 var1[i]+lag(var1,1)*var2^1+lag(var1,2)*var2^2 +.... 

till the lag(var1) reaches the row where BYGROUP_OBSNUM is 0

What I have achieved till now:

df1<-df %>%
   mutate(var3=ifelse ((!(var2==0) | (!(BYGROUP_OBSNUM==0))),  var2, 0)) %>%
    rowwise()%>%
     ungroup() %>%
      mutate(var1_new=var1+lag(var1,1)*var2)

I need to change the last line such that the formula takes the lag from lag(var1,1) till lag(var1,BYGROUP_OBSNUM) for each row and power of var2 also increases from 1 to BYGROUP_OBSNUM. How do I do this?

like image 763
Riya Avatar asked Jun 27 '17 09:06

Riya


2 Answers

Here is a custom function that can be used with dplyr to yield the results you are after. It works with the group_by function as well.

my.func <- function(x){
             mapply(function(v1,v2,n) {
               if(n == 1) {
                 as.numeric(v1[n])
               } else {
                 sum(v1, x[rev(seq(1:(n-1))),1] * v2 ^ seq(1:(n-1)))
               }
             }, x[,"var1"], x[,"var2"], seq(1:nrow(x)))
           }


df <- df %>% 
      # group_by(COLUMNS, TO, GROUP, BY) %>% 
      do(data.frame(., my.func(.))) %>%
      select(var1, var2, BYGROUP_OBSNUM, VAR1_NEW = my.func...)
like image 155
Matt Jewett Avatar answered Oct 29 '22 22:10

Matt Jewett


Made the final solution to--

df<-data.frame(var1=c(1:10),var2=c(rep(c(0,0.1),each=5)),BYG‌​ROUP_OBSNUM=c(0:4)) 

my.func <- function(x){mapply(function(v1,v2,v3,n) {
              if(v2==0 | v3==0){ as.numeric(v1) } 
              else { 
              sum(v1, x[rev(seq(1:(n-1))),1][1:v3] * v2 ^ seq(1:(n-1))[1:v3]) } },
             x[,"var1"], x[,"var2"], x[,"BYGROUP_OBSNUM"],seq(1:nrow(x))) 
            }

df1 <- df %>%   
         do(data.frame(., my.func(.))) %>% 
          mutate(VAR1_NEW = my.func...)%>% 
           select(-my.func...)

completed 100k rows in 1.42 mins! This function helped a lot! Thanks!

like image 30
Riya Avatar answered Oct 29 '22 21:10

Riya