Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply in R: recursive function that operates on its own previous result

Tags:

r

apply

cumsum

How do I apply a function that can "see" the preceding result when operating by rows?

This comes up a lot, but my current problem requires a running total by student that resets if the total doesn't get to 5.

Example Data:

> df

row   Student Absent Consecutive.Absences                             
1        A       0                    0                              
2        A       1                    1                              
3        A       1                    2                              
4        A       0                    0 <- resets to zero if under 5
5        A       0                    0                              
6        A       1                    1                              
7        A       1                    2                              
8        A       1                    3                              
9        B       1                    1 <- starts over for new factor (Student)
10       B       1                    2                              
11       B       0                    0                              
12       B       1                    1                              
13       B       1                    2                              
14       B       1                    3                              
15       B       1                    4                              
16       B       0                    0                              
17       B       1                    1                              
18       B       1                    2                              
19       B       1                    3                              
20       B       1                    4                              
21       B       1                    5                              
22       B       0                    5 <- gets locked at 5
23       B       0                    5                              
24       B       1                    6                              
25       B       1                    7             

I've tried doing this with a huge matrix of shifted vectors.

I've tried doing this with the apply family of functions and half of them do nothing, the other half hit 16GB of RAM and crash my computer.

I've tried straight looping and it takes 4+ hours (it's a big data set)

What bothers me is how easy this is in Excel. Usually R runs circles around Excel both in speed and writability, which leads me to believe I'm missing something elementary here.

Forgetting even the more challenging ("lock at 5") feature of this, I can't even get a cumsum that resets. There is no combination of factors I can think of to group for ave like this:

Consecutive.Absences = ave(Absent, ..., cumsum)

Obviously, grouping on Student will just give the Total Cumulative Absences -- it "remembers" the kid's absence over the gaps because of the split and recombine in ave.

So as I said, the core of what I don't know how to do in R is this:
How do I apply a function that can "see" the preceding result when operating by rows?

In Excel it would be easy:

C3 = IF($A3=$A2,$B3+$C2,$B3)*$B3

This excel function is displayed without the 5-absence lock for easy readability.

Once I figure out how to apply a function that looks at previous results of the same function in R, I'll be able to figure out the rest.

Thank you in advance for your help--this will be very useful in a lot of my applications!

Genuinely, Sam


UPDATE:
Thank you everyone for the ideas on how to identify if a student has 5 consecutive absences!

However, that's easy enough to do in the database at the STUDENTS table. What I need to know is the number of consecutive absences by student in the attendance record itself for things like, "Do we count this particular attendance record when calculating other summary statistics?"

like image 277
Sam Avatar asked Apr 15 '13 20:04

Sam


People also ask

What happens when a recursive function calls itself?

A recursive function must have at least one condition where it will stop calling itself, or the function will call itself indefinitely until JavaScript throws an error. The condition that stops a recursive function from calling itself is known as the base case.

Can recursive function return value?

Recursion is a method of programming or coding a problem, in which a function calls itself one or more times in its body. Usually, it is returning the return value of this function call.

How can a function call itself?

Recursion refers to a function that calls itself either directly or indirectly. There must always be a test (or a possibly-empty loop) to see if a function must call itself. A function may itself contain the test and immediately return if nothing is required..


1 Answers

If you're looking to apply a function to every element in a vector while making use the previous element's value, you might want to check out "Reduce", with the accumulate parameter set to True

Here's an example:

##define your function that takes two parameters
##these are the 'previous' and the 'current' elements
runSum <- function(sum, x){
    res = 0
    if (x == 1){
        res = sum + 1
    }
    else if (x == 0 & sum < 5){
        res = 0
    }
    else{
        res = sum
    }
    res
}

#lets look at the absent values from subject B
x = c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1)

Reduce(x=x, f=runSum, accumulate=T) 
# [1] 1 2 0 1 2 3 4 0 1 2 3 4 5 5 5 6 7
like image 123
kith Avatar answered Nov 11 '22 14:11

kith