How to improve performance of this linear interpolation

Question

For a given column in a dataframe, I want to construct a new vector which for each point consists of the average of the points on either side. However for the last observation it will instead be the second to last. And for the first observation it will be second. I wrote this R code to solve the issue, however I am calling it repeatedly and it is extremely slow. Can someone give some tips on how to do it more efficiently? Thanks.

x1 <- c(rep('a',100),rep('b',100),rep('c',100))
x2 <- rnorm(300)
x <- data.frame(x1,x2)
names(x) <- c('col1','data1')


a.linear.interpolation <- function(x) {
    require(zoo)
    require(data.table)

    a.dattab <- data.table(x)

    setkey(a.dattab,col1)

    #replace any NA values using LOCF / NOCB
    a.dattab[,data1:=na.locf(data1,na.rm=FALSE),by=list(col1)]
    a.dattab[,data1:=na.locf(data1,na.rm=FALSE,fromLast=TRUE),by=list(col1)]

    #Adding a within group sequence number and a size of group field to facilitate
    #row by row processing
    a.dattab[,grpseq:=seq_len(.N),by=list(col1)]
    a.dattab[,grpseq_max:=.N,by=list(col1)]

    #convert back to data.frame
    #data.frame seems faster than data.table for this row by row type processing
    a.df <- data.frame(a.dattab)

    new.col <- vector(length=nrow(a.df))

    for(i in seq(nrow(a.df))){
        if(a.df[i,"grpseq"]==1){
                new.col[i] <- a.df[i+1,"data1"]
            }
        else if(a.df[i,"grpseq"]==a.df[i,"grpseq_max"]){
                new.col[i] <- a.df[i-1,"data1"]
            }
        else {
                new.col[i] <- (a.df[i-1,"data1"]+a.df[i+1,"data1"])/2
            }
    }

    return(new.col)
}

thelatemail · Accepted Answer

Apart from using rollmeans, the base R filter function can do this sort of thing as well. E.g.:

linint <- function(vec) {
  c(vec[2], filter(vec, c(0.5, 0, 0.5))[-c(1, length(vec))], vec[length(vec) - 1])
}

x <- c(1,3,6,10,1)
linint(x)
#[1]  3.0  3.5  6.5  3.5 10.0

And it's pretty quick, chewing through 10M cases in less than a second:

x <- rnorm(1e7)
system.time(linint(x))
#user  system elapsed 
#0.57    0.18    0.75

How to improve performance of this linear interpolation

Tags:

r

interpolation

Antonio2100

1 Answers

thelatemail

Recent Activity

Donate For Us

How to improve performance of this linear interpolation

Tags:

r

interpolation

Antonio2100

1 Answers

thelatemail

Related questions

Recent Activity

Donate For Us