Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R functions on sequential rows in a data frame

Tags:

dataframe

r

I have a data frame that is made up of mostly sequential rows. Mostly meaning that some are out of sequence or missing. When the the sequential row for the current row is present, I'd like to perform some function using data from both rows. If it's not present, skip it and move on. I know I can do this with a loop, but it's quite slow. I think this has something to do with using the index. Here is an example of my problem using sample data and a desired result that uses a loop.

df <- data.frame(id=1:10, x=rnorm(10))
df <- df[c(1:3, 5:10), ]
df$z <- NA


dfLoop <- function(d)
{
  for(i in 1:(nrow(d)-1))
  {
    if(d[i+1, ]$id - d[i, ]$id == 1)
    {
      d[i, ]$z = d[i+1, ]$x - d[i, ]$x
    }
  }

  return(d)
}

dfLoop(df)

So how might I be able to get the same result without using a loop? Thanks for any help.

like image 806
robbie Avatar asked Jan 19 '26 11:01

robbie


2 Answers

Give this a try:

index <- which(diff(df$id)==1) #gives the index of rows that have a row below in sequence

df$z[index] <- diff(df$x)[index]

As a function:

fun <- function(x) {
  index <- which(diff(x$id)==1)
  xdiff <- diff(x$x)
  x$z[index] <- xdiff[index]
  return(x)
}

Compare with your loop:

a <- fun(df)
b <- dfLoop(df)
identical(a, b)
[1] TRUE
like image 193
alexwhan Avatar answered Jan 21 '26 02:01

alexwhan


R is vector-based. Try this code -- it is just like your for loop but using the entire range at once:

i <- 1:(nrow(d)-1)
d[i+1, ]$id - d[i, ]$id == 1

You should see a vector of length nrow(d) - 1, containing the indexes where the condition holds. Save it:

cond <- (d[i+1, ]$id - d[i, ]$id == 1)

You can also get the positions of all TRUE values:

(cond.pos <- which(cond))

Now you can assign values to those indexes where the condition is true:

d[cond.pos, ]$z <- d[cond.pos+1, ]$x - d[cond.pos, ]$x

There are quite a few ways to achieve what you want, but it takes some experience to grab the "vector-based" idea. Especially the diff function, as noted by alexwhan, can help save some typing for this specific example.

like image 41
krlmlr Avatar answered Jan 21 '26 02:01

krlmlr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!