Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split numeric vector into unequal sections, then apply a custom function to each section

I have a long sequence of 1s and 0s which represent bird incubation patterns, 1 being bird ON the nest, 0 being OFF.

    > Fake.data<- c(1,1,1,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,0,0,0,0,1,1,0,1,0)

As an end point I would essentially like a single value for the ratio between each ON period and the consecutive OFF period. So ideally this should be for Fake.data a vector like this

    [1] 0.4  0.75  0.25  0.5  0.8  0.5  1 #(I just typed this out!) 

So far I have split the vector into sections using split()

    > Diff<-diff(Fake.data)
    > SPLIT<-split(Fake.data, cumsum(c(1, Diff > 0 )))
    > SPLIT

Which returns...

    $`1`
    [1] 1 1 1 1 1 0 0
    $`2`
    [1] 1 1 1 1 0 0 0
    $`3`
    [1] 1 1 1 1 0
    $`4`
    [1] 1 1 1 1 0 0
    $`5`
    [1] 1 1 1 1 1 0 0 0 0
    $`6`
    [1] 1 1 0
    $`7`
    [1] 1 0

So I can get the ratio for a single split group using

    > SPLIT$'1'<- ((length(SPLIT$'1'))-(sum(SPLIT$'1')))/sum(SPLIT$'1')
    > SPLIT$'1'
    [1] 0.4

However in my data I have some several thousand of these to do and would like to apply some sort of tapply() or for() loop to calculate this automatically for all and put it into a single vector. I have tried each of these methods with little success as the split() output structure does not seem to fit with these functions?

I create a new vector to receive the for() loop output

    ratio<-rep(as.character(NA),(length(SPLIT)))

Then attempting the for() loop using the code above which work for a single run.

    for(i in SPLIT$'1':'7')
    {ratio[i]<-((length(SPLIT$'[i]'))-(sum(SPLIT$'[i]')))/sum(SPLIT$'[i]')}

What I get is...

[1] "NaN" "NaN" "NaN" "NaN" "NaN" "NaN" NA

Tried many other variations along this theme but now just really stuck!

like image 222
Roasty247 Avatar asked Oct 28 '25 13:10

Roasty247


2 Answers

I think you were very close with your stategy. The sapply function is very happy to work with lists. I would just change the last step to

sapply(SPLIT, function(x) sum(x==0)/sum(x==1))

which returns

   1    2    3    4    5    6    7 
0.40 0.75 0.25 0.50 0.80 0.50 1.00 

with your sample data. No additional packages needed.

like image 151
MrFlick Avatar answered Oct 30 '25 04:10

MrFlick


Here are two possibiities:

1) Compute the lengths using rle and then in the if statement if the data starts with 0 don't include the first length so we are assured that we are starting out with a 1. Finally compute the ratios using rollapply from the zoo package:

library(zoo)

lengths <- rle(Fake.data)$lengths
if (Fake.data[1] == 0) lengths <- lengths[-1]

rollapply(lengths, 2, by = 2, function(x) x[2]/x[1])

giving:

[1] 0.40 0.75 0.25 0.50 0.80 0.50 1.00

The if line can be removed if we know that the data always starts with a 1.

2) If we can assume that the series always starts with a 1 and ends in a 0 then this one liner would work:

with( rle(Fake.data), lengths[values == 0] / lengths[values == 1] )

giving the same answer as above.

like image 26
G. Grothendieck Avatar answered Oct 30 '25 03:10

G. Grothendieck