Time as the third dimension
Spatio-Temporal Kriging in R
projectionto create spatial data. There is no need of loading it, since the same functions are available under different names in sp. However, I prefer these two because they are easier to remember. The last packages are rgdal and rgeos, for performing various operations on geodata.
POSIXct, which are standard ways of representing time in R. This very first thing we have to do is of course setting the working directory and loading the csv file:
POSIXlt, but we first need to extract only the first 10 digits, and then convert it. This can be done in one line of code but with multiple functions:
paste(data$generation_time). This creates the character string shown above, which we can then subset using the function
substr. This function is used to subtract characters from a string and takes three arguments: a string, a starting character and a stopping character. In this case we want to basically delete the last 3 numbers from our string, so we set the start on the first number (
start=1), and the stop at 10 (
stop=10). Then we need to change the numerical string back to a numerical format, using the function
as.numeric. Now we just need one last function to tell R that this particular number is a Date/Time object. We can do this using the function
as.POSIXlt, which takes the actual number we just created plus an origin. Since we are using UNIX time, we need to set the starting point at "1970-01-01". We can test this function of the first element of the vector data$generation_time to test its output:
data.framedata has a new column named TIME where the Date/Time information are stored.
substrto extract only the numbers we need. For converting this format into degrees, we need to sum the degrees with the minutes divided by 60. So in the first part of the equation we just need to extract the first two digits of the numerical string and transform them back to numerical format. In the second part we need to extract the remaining of the strings, transform them into numbers and then divided them by 60.This operation creates some
NAs in the dataset, for which you will get a warning message. We do not have to worry about it as we can just exclude them with the following line:
data.frameobjects using Date/Time:
SpatialPointsDataFramewith coordinates in metres.
STIDFformats. The first represents objects with a complete space time grid. In other words in this category are included objects such as the grid of weather stations presented in Fig.1. The spatio-temporal object is created using the n locations of the weather stations and the m time intervals of their observations. The spatio-temporal grid is of size nxm.
STIDFobjects are the one we are going to use for this example. These are unstructured spatio-temporal objects, where both space and time change dynamically. For example, in this case we have data collected on top of trams moving around the city of Zurich. This means that the location of the sensors is not consistent throughout the sampling window.
STIDFobjects is fairly simple, we just need to disassemble the
data.framewe have into a spatial, temporal and data components, and then merge them together to create the
SpatialPointsobject, with the locations of the sensors at any given time:
ozoneSP <- SpatialPoints(ozone.UTM@coords,CRS("+init=epsg:3395"))
SpatialPointsin the package sp. This function takes two arguments, the first is a
data.framewith the coordinates of each point. In this case I used the coordinates of the
SpatialPointsDataFramewe created before, which are provided in a
matrixformat. Then I set the projection in UTM.
At this point we need to perform a very important operation for kriging, which is check whether we have some duplicated points. It may happen sometime that there are points with identical coordinates. Kriging cannot handle this and returns an error, generally in the form of a “singular matrix”. Most of the time in which this happens the problem is related to duplicated locations. So we now have to check if we have duplicates here, using the function
dupl <- zerodist(ozoneSP)
ozoneDF <- data.frame(PPB=ozone.UTM$ozone_ppb[-dupl[,2]])
data.framewith only one column, named PPB, with the ozone observations in part per billion. As you can see I removed the duplicated points by excluding the rows from the object ozone.UTM with the indexes included in one of the columns of the object dupl. We can use the same trick while creating the temporal part:
ozoneTM <- as.POSIXct(ozone.UTM$TIME[-dupl[,2]],tz="CET")
timeDF <- STIDF(ozoneSP,ozoneTM,data=ozoneDF)
STIDFobject by using the spatio-temporal version of the function
spplot, which is
variogramST. Its use is similar to the standard function for spatial kriging, even though there are some settings for the temporal component that need to be included.
variogramSTis identical to a normal call to the function
variogram; we first have the formula and then the data source. However, then we have to specify the time units (
tunits) or the time lags (
tlags). I found the documentation around this point a bit confusing to be honest. I tested various combinations of parameters and the line of code I presented is the only one that gives me what appear to be good results. I presume that what I am telling to the function is to aggregate the data to the hours, but I am not completely sure. I hope some of the readers can shed some light on this!!
I must warn you that this operation takes quite a long time, so please be aware of that. I personally ran it overnight.
Plotting the Variogram
fit.StVariogram, which are the spatio-temporal matches for
Below I present the code I used to fit all the models. For the automatic fitting I used most of the settings suggested in the following demo:
The first thing to set are the upper and lower limits for all the variogram parameters, which are used during the automatic fitting:
# lower and upper bounds pars.l <- c(sill.s = 0, range.s = 10, nugget.s = 0,sill.t = 0, range.t = 1, nugget.t = 0,sill.st = 0, range.st = 10, nugget.st = 0, anis = 0) pars.u <- c(sill.s = 200, range.s = 1000, nugget.s = 100,sill.t = 200, range.t = 60, nugget.t = 100,sill.st = 200, range.st = 1000, nugget.st = 100,anis = 700)
vgmSTdo not make much of a difference, so probably you do not have to worry too much about the parameters you select in
We can check how this model fits our data by using the function
fit.StVariogramwith the option
fit.method=0, which keeps this model but calculates its Mean Absolute Error (MSE), compared to the actual data:
> extractPar(separable_Vgm) range.s nugget.s range.t nugget.t sill 199.999323 10.000000 99.999714 1.119817 17.236256
vgmSTwe need to provide both the spatial and temporal component, plus the value of the parameter
k(which needs to be positive):
k = 5, but R returned an error message saying that it needed to be positive, which I did not understand. However, with 50 it worked and as I mentioned the automatic fit does not care much about these initial values, probably the most important things are the upper and lower bounds we set before.
We can then proceed with the fitting process and we can check the MSE with the following two lines:
stAniand creating a metric model in
vgmSTcan be done as follows:
metric <- vgmST("metric", joint = vgm(50,"Mat", 500, 0), stAni=200)
The automatic fit produces the following MSE:
We can plot this model to visually check its accuracy:
The automatic fit can be done like so:
Which creates the following model:
Simple Sum Metric
This returns a model similar to the sum metric:
Choosing the Best Model
roads <- shapefile("VEC25_str_l_Clip/VEC25_str_l.shp")
Klass1 <- roads[roads$objectval=="1_Klass",]
Klass1.UTM <- spTransform(Klass1,CRS("+init=epsg:3395"))
cropin rgeos, with the object ozone.UTM that I created before:
Klass1.cropped <- crop(Klass1.UTM,ozone.UTM)
spsampleto create a random grid of points along the road lines:
sp.grid.UTM <- spsample(Klass1.cropped,n=1500,type="random")
POSIXctvalues between the two Date/Times provided. In this case we are interested in creating a spatio-temporal data frame, since we do not yet have any data for it. Therefore we can use the function
STFto merge spatial and temporal data into a spatio-temporal grid:
grid.ST <- STF(sp.grid.UTM,tm.grid)
This can be used as new data in the kriging function.
krigeSTto perform the interpolation:
pred <- krigeST(PPB~1, data=timeDF, modelList=sumMetric_Vgm, newdata=grid.ST)
We can plot the results again using the function
vignette("st", package = "gstat")
ReferencesGräler, B., 2012. Different concepts of spatio-temporal kriging [WWW Document]. URL geostat-course.org/system/files/part01.pdf (accessed 8.18.15).
Gräler, B., Pebesma, Edzer, Heuvelink, G., 2015. Spatio-Temporal Interpolation using gstat.
Gräler, B., Rehr, M., Gerharz, L., Pebesma, E., 2013. Spatio-temporal analysis and interpolation of PM10 measurements in Europe for 2009.
Oliver, M., Webster, R., Gerrard, J., 1989. Geostatistics in Physical Geography. Part I: Theory. Trans. Inst. Br. Geogr., New Series 14, 259–269. doi:10.2307/622687
Sherman, M., 2011. Spatial statistics and spatio-temporal data: covariance functions and directional properties. John Wiley & Sons.
All the code snippets were created by Pretty R at inside-R.org