Take a data frame that looks like this and contains data for some dates in 2005 and a measurement at each date.
df <- data.frame("date" = c('2005-04-04','2005-04-19', '2005-04-26', '2005-05-05',
'2005-05-12', '2005-05-25', '2005-06-02', '2005-06-16', '2005-07-07', '2005-07-14',
'2005-07-21', '2005-08-04'), "numbers" = c(90,50,50,48,44,37,34,30,36,31,49,54))
I want to create a sequence of values from 1:365 based on this for each day of the year, essentially to create a new data frame from 01/01/2005 to 31/12/2005 which has been infilled with the values from a spline function fitting over these existing 12 values.
When I try to do this using:
numbers <- df$numbers
x = spline(1:365, numbers)
I get
Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ'
I'm not sure what is going wrong.
In the mathematical field of numerical analysis, spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline.
The linear spline represents a set of line segments between the two adjacent data points (Vk,Ik) and (Vk+1,Ik+1). The equations for each line segment can be immediately found in a simple form: Ik(V) = Ik + ( Ik+1 - Ik) ( V - Vk ) / (Vk+1 - Vk), where V = [Vk,Vk+1] and k = 0,1,...,(n-1).
Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters. It takes a vector or data frame as an argument and divides the information into groups.
It is easy to get rid of the error, but hard to get a sensible answer.
x <- as.POSIXlt(as.character(df$date))$yday + 1 ## day of year (start from 1)
y <- df$number
There are many interpolation splines: "fmm", "periodic", "natural", "monoH.FC" and "hyman". But not all of them are applicable here.
y1 <- spline(x, y, xout = 1:365, method = "fmm")
y2 <- spline(x, y, xout = 1:365, method = "periodic")
#Warning message:
#In spline(x, y, xout = 1:365, method = "periodic") :
# spline: first and last y values differ - using y[1] for both
y3 <- spline(x, y, xout = 1:365, method = "natural")
y4 <- spline(x, y, xout = 1:365, method = "monoH.FC")
#Error in spline(x, y, xout = 1:365, method = "monoH.FC") :
# invalid interpolation method
y5 <- spline(x, y, xout = 1:365, method = "hyman")
#Error in spline(x, y, xout = 1:365, method = "hyman") :
# 'y' must be increasing or decreasing
See ?spline
for details of those methods and the necessary assumption / requirement for them.
So obviously only y1
and y3
have been obtained without problem. Let's sketch them.
par(mfrow = c(1, 2))
plot(y1, type = "l", main = "fmm"); points(x, y, pch = 19)
plot(y3, type = "l", main = "natural"); points(x, y, pch = 19)
As we can see, we have big problem when extrapolating data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With