Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorize my thinking: Vector Operations in R

Tags:

r

vector

So earlier I answered my own question on thinking in vectors in R. But now I have another problem which I can't 'vectorize.' I know vectors are faster and loops slower, but I can't figure out how to do this in a vector method:

I have a data frame (which for sentimental reasons I like to call my.data) which I want to do a full marginal analysis on. I need to remove certain elements one at a time and 'value' the data frame then I need to do the iterating again by removing only the next element. Then do again... and again... The idea is to do a full marginal analysis on a subset of my data. Anyhow, I can't conceive of how to do this in a vector efficient way.

I've shortened the looping part of the code down and it looks something like this:

for (j in my.data$item[my.data$fixed==0]) { # <-- selects the items I want to loop 
                                            #     through
    my.data.it <- my.data[my.data$item!= j,] # <-- this kicks item j out of the list
    sum.data <-aggregate(my.data.it, by=list(year), FUN=sum, na.rm=TRUE) #<-- do an
                                                                         # aggregation

    do(a.little.dance) && make(a.little.love) -> get.down(tonight) # <-- a little
                                                                   #  song and dance

    delta <- (get.love)                                         # <-- get some love
    delta.list<-append(delta.list, delta, after=length(delta.list)) #<-- put my love
                                                                    #    in a vector 
}

So obviously I hacked out a bunch of stuff in the middle, just to make it less clumsy. The goal would be to remove the j loop using something more vector efficient. Any ideas?

like image 346
JD Long Avatar asked Jan 14 '09 23:01

JD Long


2 Answers

Here's what seems like another very R-type way to generate the sums. Generate a vector that is as long as your input vector, containing nothing but the repeated sum of n elements. Then, subtract your original vector from the sums vector. The result: a vector (isums) where each entry is your original vector less the ith element.

> (my.data$item[my.data$fixed==0])
[1] 1 1 3 5 7
> sums <- rep(sum(my.data$item[my.data$fixed==0]),length(my.data$item[my.data$fixed==0]))
> sums
[1] 17 17 17 17 17
> isums <- sums - (my.data$item[my.data$fixed==0])
> isums
[1] 16 16 14 12 10
like image 169
Wil Doane Avatar answered Nov 16 '22 19:11

Wil Doane


Strangely enough, learning to vectorize in R is what helped me get used to basic functional programming. A basic technique would be to define your operations inside the loop as a function:

data = ...;
items = ...;

leave_one_out = function(i) {
   data1 = data[items != i];
   delta = ...;  # some operation on data1
   return delta;
}


for (j in items) {
   delta.list = cbind(delta.list, leave_one_out(j));
}

To vectorize, all you do is replace the for loop with the sapply mapping function:

delta.list = sapply(items, leave_one_out);
like image 9
bubaker Avatar answered Nov 16 '22 19:11

bubaker