I have weather data available for about 6 weather stations. For all these stations I have the longitude and latitude available, and also the datetime (every 10 minutes from beginning of 2016 or so). I want to use the kriging interpolation method to fill in missing values at other long/lat locations (between these stations).
I know that scikit-learn has the 'GaussianProcessRegressor' which can be used for kriging. However, I do not understand how I can include the temporal dimensions in the fitting process. Is this even possible or should I fit a separate model for every datetime I have?
X must be an array of features, which in my case would be the latitude and longitude (I think). X is now a list of 6 lat/long pairs (e.g. [52.1093, 5.181]) for every station. I took one date to test the GPR. y is a list of length 6 that contains the dew points for those stations at that specific time.
Now the problem thus is that I actually want to do kriging for all the datetimes. How do I incorporate these datetimes? Should I add the datetimes as features in the X array (e.g. [52.1093, 5.181, 2017, 1, 2, 10, 50])? This looks really weird to me. However, I can't find any other way to also model the temporal factor.
My code for fitting the GaussianProcessRegressor:
one_date = meteo_df[meteo_df['datetime'] ==
datetime].drop_duplicates(subset=['long', 'lat'], keep='last')
long = one_date['long']
lat = one_date['lat']
x = [[la,lo] for la, lo in zip(lat, long)]
y = list(one_date['dew_point'])
GPR = GaussianProcessRegressor(n_restarts_optimizer=10)
GPR.fit(x, y)
I am assuming that you want out of the box solutions. You have a few options, albeit some feel a bit hacky to me.
Graeler et al. 2013 describes, compares and expands some of these options in their paper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With