I have the following data frame,
df <- data.frame(id = c("a", "a", "a", "a", "b", "b", "b", "b"),
time = 1:4, value = c(100, NA, NA, 550, 300, NA, NA, 900))
Can someone suggest an approach for replacing the NA values in df by dividing the difference of the value column evenly over time? At time 1, A is 100 and at time 4 A is 550. How would one change the NAs in times 2 and 3 to 250 and 400? And then 500 and 700 for B at times 2 and 3?
I can write a complex for loop to brute force it, but is there a more efficient solution?
Know the formula for the linear interpolation process. The formula is y = y1 + ((x - x1) / (x2 - x1)) * (y2 - y1), where x is the known value, y is the unknown value, x1 and y1 are the coordinates that are below the known x value, and x2 and y2 are the coordinates that are above the x value.
In mathematics, linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.
Linear Interpolation is the technique of determining the values of the functions of any intermediate points when the values of two adjacent points are known. Linear interpolation is basically the estimation of an unknown value that falls within two known values.
Linear interpolation is an imputation technique that assumes a linear relationship between data points and utilises non-missing values from adjacent data points to compute a value for a missing data point.
You could use na.approx
from zoo
library(zoo)
df$value <- na.approx(df$value)
df
# id time value
#1 a 1 100
#2 a 2 250
#3 a 3 400
#4 a 4 550
#5 b 1 300
#6 b 2 500
#7 b 3 700
#8 b 4 900
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With