DIFM:Dynamic ICAR Spatiotemporal Factor Models

Hwasoo Shin, Marco A. R. Ferreira

2024-04-11

Introduction

This package DIFM provides codes to run Dynamic ICAR Spatiotemporal Factor Models (DIFM). It includes codes to initialize parameters, run MCMC, evaluate models, and generate plots. Read (Shin and Ferreira, 2023) for more details.

The vignette presents the model description of DIFM and the assumptions for common factors and factor loadings. We provide an example of crime rates in the western states of United States.

Model Description

We assume that the region of study is partitioned into \(r\) subregions. In our motivating example, the subregions are the states in the contiguous United States. The variable of interest in each subregion is observed at \(n\) time points. Let \(\mathbf{y}_t\) be the \(r\)-dimensional vector of observations at time \(t\) \((t = 1, 2, \dots, n.)\)

We assume that the spatiotemporal behavior of the \(r\) subregions can be represented by \(k\) factors, where usually \(k\) is much smaller than \(r\). Specifically, we assume the model

\[\begin{equation}\label{MainEQ1} \mathbf{y}_t = \mathbf{B} \mathbf{x}_t + \mathbf{v}_t, \tag{1} \end{equation}\]

where \(\mathbf{x}_t\) is the \(k\)-dimensional vector of factors at time \(t\), \(\mathbf{B}\) is an \(r \times k\) matrix of factor loadings, and \(\mathbf{v}_t\) is the \(r\)-dimensional vector of errors at time \(t\). We assume that the observational error vector \(\mathbf{v}_t\), \(t = 1, 2, \dots, n\) is independent over time and follows a Gaussian distribution \(\mathbf{v}_t \sim \text{N}(0, \mathbf{V})\), where \(\mathbf{V} = \mbox{diag}(\sigma^2_1, \dots, \sigma^2_r)\). Each of the variances \(\sigma^2_1, \dots, \sigma^2_r\) is specific to one of the \(r\) subregions, and thus they are known as idiosyncratic variances.

We assume that the vector of factors \(\mathbf{x}_t\) follows a dynamic linear model (West and Harrison, 1997; Prado et al., 2021). Specifically, we assume the general model \[\begin{eqnarray}\label{MainEQ2} \mathbf{x}_t &=& \mathbf{F} \mathbf{\theta}_t, \tag{2} \\ \mathbf{\theta}_t &=& \mathbf{G} \mathbf{\theta}_{t-1} + \mathbf{\omega}_t, \mathbf{\omega}_t \sim \text{N}(0, \mathbf{W}), \tag{3} \label{MainEQ3} \end{eqnarray}\] where \(\mathbf{\theta}_t\) is a latent process that allows great flexibility in the description of the temporal evolution of \(\mathbf{x}_t\). Specifically, \(\mathbf{\theta}_t\) may encode different types of temporal trends as well as seasonality. For example, in our application we assume a second-order polynomial DLM and specify \(\theta_t\) as a vector of dimension \(2k\) that contains the level and the gradient of \(\mathbf{x}_t\) at time \(t\). In addition, the evolution matrix \(\mathbf{G}\) describes the temporal evolution of the latent process \(\mathbf{\theta}_t\). Further, \(\mathbf{\omega}_t\) is a \(2k\)-dimensional innovation vector with a dense covariance matrix \(\mathbf{W}\). Finally, the matrix \(\mathbf{F}\) relates the vector of common factors \(\mathbf{x}_t\) to the appropriate elements of the latent process \(\mathbf{\theta}_t\).

In the case of the second-order polynomial DLM that we consider, \(\mathbf{\theta}_t = (\theta_{t,1} , \theta_{t,2}, \dots , \mathbf{\theta}_{t,2k})^T\) is a vector of dimension \(2k\) where \((\theta_{t,1}, \theta_{t,3}, \dots, \theta_{t,2k-1})^T\) and \((\theta_{t,2}, \theta_{t,4}, \dots, \theta_{t,2k})^T\) are respectively the level and the gradient of the vector of common factors \(\mathbf{x}_t\). Thus, the matrix \(\mathbf{F}\) that relates \(\mathbf{x}_t\) to \(\mathbf{\theta}_t\) is a \(k \times 2k\) matrix of the form \[\begin{equation*} \mathbf{F} = \left[ \begin{array}{cccccc} 1 & 0 & 0 & \dots & 0 & 0\\ 0 & 0 & 1 & \dots & 0 & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & \dots & 1 & 0 \end{array} \right]. \end{equation*}\] The evolution matrix \(\mathbf{G}\) has dimension \(2k \times 2k\) and satisfies \(\theta_{t,2j-1} = \theta_{t-1,2j-1} + \theta_{t-1,2j} + \omega_{t, 2j - 1}\) and \(\theta_{t,2j} = \theta_{t-1,2j} + \omega_{t,2j}\), \(j = 1, \dots, k\). Therefore, \(\mathbf{G}\) = \(blockdiag(\mathbf{G}_0, \dots ,\mathbf{G}_0)\) where \[\begin{equation*} \mathbf{G}_0= \left[ \begin{array}{cc} 1 & 1\\ 0 & 1 \end{array} \right]. \end{equation*}\]

The specification of the factor loadings matrix \(\mathbf{B}\) is crucial in our dynamic ICAR factor model. An important point to consider is the need for constraints on the matrix \(\mathbf{B}\) to ensure identifiability of the model. Specifically, for any invertible \(k \times k\) matrix \(\mathbf{A}\), substituting \(\mathbf{B}\) and \(\mathbf{x}_t\) in Equation (1) by, respectively, \(\mathbf{B}^* = \mathbf{B} \mathbf{A}\) and \(\mathbf{x}_t^* = \mathbf{A}^{-1} \mathbf{x}_t\) would lead to the same model. To ensure identifiability, we impose a hierarchical structural constraint that assumes that \(\mathbf{B}\) is a full-rank block lower triangular matrix with diagonal elements equal to 1 (Aguilar and West, 2000). Specifically, we assume \(\mathbf{B}\) has the form \[\begin{equation*} \mathbf{B} = \begin{bmatrix} 1 & 0 & 0 & \dots & 0\\ b_{2,1} & 1 & 0 & \dots & 0\\ b_{3,1} & b_{3,2} & 1 & \dots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ b_{k,1} & b_{k,2} & b_{k,3} & \dots & 1\\ b_{k+1,1} & b_{k+1,2} & b_{k+1,3} & \dots & b_{k+1,k}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ b_{r,1} & b_{r,2} & b_{r,3} & \dots & b_{r,k} \end{bmatrix}. \end{equation*}\]

To account for the spatial dependence among the factor loadings for neighboring subregions, we assume that each column of the matrix of factor loadings \(\mathbf{B}\) follows an intrinsic conditional autoregressive model (Besag et al., 1991; Keefe et al., 2018). Specifically, we assume for the \(j\)th column \(\mathbf{B}_j, j = 1, \dots, k,\) the density \[\begin{equation}\label{MainEQ4} p(\mathbf{B}_j) \propto \exp\left(-\frac{1}{2\tau_j} \mathbf{B}_j^T \mathbf{H} \mathbf{B}_j\right), \end{equation}\] where \(\mathbf{H}\) is a precision matrix that accounts for the spatial dependence among neighboring subregions and \(\tau_j\) controls the strength of spatial correlation among factor loadings. Specifically, if subregions \(i\) and \(j\) are neighbors, then the corresponding element of the matrix \(\mathbf{H}\) is \(h_{ij} = -g_{ij}\) where \(g_{ij}\) measures the strength of the association between subregions \(i\) and \(j\). If subregions \(i\) and \(j\) are not neighbors, then \(g_{ij} = 0\). Finally, the \(i\)th diagonal element of matrix \(\mathbf{H}\) is \(h_{ii} = \sum_{j \neq i} g_{ij}\). For example, a widely used choice for \(\mathbf{H}\) assumes \(g_{ij} = 1\) if \(i\) and \(j\) share a border, and \(g_{ij} = 0\) otherwise. In that case, \(h_{ii}\) is equal to the number of neighbors of subregion \(i\). Further, we assume that there are no islands which implies that the matrix \(\mathbf{H}\) has one eigenvalue equal to 0 and all other eigenvalues larger than zero. Note that we assume this prior for each column of \(\mathbf{B}\). Let \(\mathbf{B}_{.j}^*=\mathbf{B}_{(j+1):r, j}\) be the \(j\)th column of \(\mathbf{B}\) without the first \(j\) elements that are fixed. In addition, let \(\mathbf{H}_j^* = \mathbf{H}_{(j+1):r, (j+1):r}\). Then, the conditional distribution of \(\mathbf{B}_{.j}^*\) given \(\mathbf{B}_{1:j, j}=(0,\ldots,0,1)^T\) is multivariate normal with mean vector \(\mathbf{h}_j = -\mathbf{H}_j^{*-1} \mathbf{H}_{(j+1):r, j}\) and precision matrix \(\mathbf{H}_j^*\).

Examples

We apply DIFM to western United States crime datasets. The data was collected from Bureau of Justice Statistics and available at disaster center website. In this vignette, we provide two datasets, Violent and Property for violent and property crime, respectively. The data was collected from 50 states of United States and District of Columbia from 1960 to 2019. In this example, we use the WestStates data included in DIFM package that contains the information of the map and polygon of the 11 western states: Arizona, California, Colorado, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington and Wyoming. The numbers represent the cases of crime per 100,000 people. We use the square root of the data to stabilize the variance.

Step 1: Read and explore the data

data(Violent)
data(Property)
data(WestStates)
Violent <- as.matrix(Violent)
Violent <- sqrt(Violent)
Property <- as.matrix(Property)
Property <- sqrt(Property)

After we call the datasets, we explore the data through plots.

par(mar=c(2,2,2,2))
layout(rbind(1:4, 5:8, c(9:11,0)))
for(i in 1:11){
  plot(1960:2019, Violent[,i], main = colnames(Violent)[i], type = "l", xlab = "", ylab = "")
}
Figure 1: Number of violent crimes

Figure 1: Number of violent crimes

Figure 1 shows the square root of the number of violent crimes in Western states. In most of the states, the cases of violent crimes soar by 2000 and have small changes. The trend after 2000 differ by states. Since many states share similar trend, we can assume that DIFM would be applicable in this case.

par(mar=c(2,2,2,2))
layout(rbind(1:4, 5:8, c(9:11,0)))
for(i in 1:11){
  plot(1960:2019, Property[,i], main = colnames(Property)[i], type = "l", xlab = "", ylab = "")
}
Figure 2: Number of property crimes

Figure 2: Number of property crimes

Figure 2 shows the square root of the number of property crimes in Western states. Property crimes soar by 1980 in western states, but would usually decrease from then. Many states show start of sharp decrease between 1980 and 1990. These information can be reperesented with smaller number of factors through DIFM.

Step 2: Run DIFM with range of factors.

Now we run DIFM with different number of factors. For our two examples, we try models from 1 to 4 factors. Before we start DIFM, we should permute the order of the variable to adjust the structural hierarchical constraint. We set the variable that would represent the factor well according to the eigenvectors. From Figure 1 and Figure 2, we can find that the time serires are rather non-stationary than stationary. Therefore, we consider a second order polynomial for the dynamic linear model. First, we run MCMC for the violent crime data.

n.iter <- 5000
n.save <- 10
G0 <- rbind(c(1,1), c(0,1))
Violent.permutation <- permutation.order(Violent, 4)
Violent <- Violent[,Violent.permutation]
Violent.Hlist <- buildH(WestStates, Violent.permutation)

set.seed(1101)

model.attributes1V <- difm.model.attributes(Violent, n.iter, n.factors = 1, G0)
hyp.parm1V <- difm.hyp.parm(model.attributes1V, Hlist = Violent.Hlist)
ViolentDIFM1 <- DIFMcpp(model.attributes1V, hyp.parm1V, Violent, every = n.save, verbose = FALSE)
ViolentAssess1 <- marginal_d_cpp(Violent, model.attributes1V, hyp.parm1V, ViolentDIFM1, verbose = FALSE)

model.attributes2V <- difm.model.attributes(Violent, n.iter, n.factors = 2, G0)
hyp.parm2V <- difm.hyp.parm(model.attributes2V, Hlist = Violent.Hlist)
ViolentDIFM2 <- DIFMcpp(model.attributes2V, hyp.parm2V, Violent, every = n.save, verbose = FALSE)
ViolentAssess2 <- marginal_d_cpp(Violent, model.attributes2V, hyp.parm2V, ViolentDIFM2, verbose = FALSE)

model.attributes3V <- difm.model.attributes(Violent, n.iter, n.factors = 3, G0)
hyp.parm3V <- difm.hyp.parm(model.attributes3V, Hlist = Violent.Hlist)
ViolentDIFM3 <- DIFMcpp(model.attributes3V, hyp.parm3V, Violent, every = n.save, verbose = FALSE)
ViolentAssess3 <- marginal_d_cpp(Violent, model.attributes3V, hyp.parm3V, ViolentDIFM3, verbose = FALSE)

model.attributes4V <- difm.model.attributes(Violent, n.iter, n.factors = 4, G0)
hyp.parm4V <- difm.hyp.parm(model.attributes4V, Hlist = Violent.Hlist)
ViolentDIFM4 <- DIFMcpp(model.attributes4V, hyp.parm4V, Violent, every = n.save, verbose = FALSE)
ViolentAssess4 <- marginal_d_cpp(Violent, model.attributes4V, hyp.parm4V, ViolentDIFM4, verbose = FALSE)

Now we run the MCMC for the property crime data.

Property.permutation <- permutation.order(Property, 4)
Property <- Property[,Property.permutation]
Property.Hlist <- buildH(WestStates, Property.permutation)

set.seed(1101)

model.attributes1P <- difm.model.attributes(Property, n.iter, n.factors = 1, G0)
hyp.parm1P <- difm.hyp.parm(model.attributes1P, Hlist = Property.Hlist)
PropertyDIFM1 <- DIFMcpp(model.attributes1P, hyp.parm1P, Property, every = n.save, verbose = FALSE)
PropertyAssess1 <- marginal_d_cpp(Property, model.attributes1P, hyp.parm1P, PropertyDIFM1, verbose = FALSE)

model.attributes2P <- difm.model.attributes(Property, n.iter, n.factors = 2, G0)
hyp.parm2P <- difm.hyp.parm(model.attributes2P, Hlist = Property.Hlist)
PropertyDIFM2 <- DIFMcpp(model.attributes2P, hyp.parm2P, Property, every = n.save, verbose = FALSE)
PropertyAssess2 <- marginal_d_cpp(Property, model.attributes2P, hyp.parm2P, PropertyDIFM2, verbose = FALSE)

model.attributes3P <- difm.model.attributes(Property, n.iter, n.factors = 3, G0)
hyp.parm3P <- difm.hyp.parm(model.attributes3P, Hlist = Property.Hlist)
PropertyDIFM3 <- DIFMcpp(model.attributes3P, hyp.parm3P, Property, every = n.save, verbose = FALSE)
PropertyAssess3 <- marginal_d_cpp(Property, model.attributes3P, hyp.parm3P, PropertyDIFM3, verbose = FALSE)

model.attributes4P <- difm.model.attributes(Property, n.iter, n.factors = 4, G0)
hyp.parm4P <- difm.hyp.parm(model.attributes4P, Hlist = Property.Hlist)
PropertyDIFM4 <- DIFMcpp(model.attributes4P, hyp.parm4P, Property, every = n.save, verbose = FALSE)
PropertyAssess4 <- marginal_d_cpp(Property, model.attributes4P, hyp.parm4P, PropertyDIFM4, verbose = FALSE)

We select the best model through the Metropolis-Laplace estimator of the predictive density.

PDtable <- matrix(NA, 2, 4)
PDtable[1,] <- c(ViolentAssess1$Maximum, ViolentAssess2$Maximum, ViolentAssess3$Maximum, ViolentAssess4$Maximum)
PDtable[2,] <- c(PropertyAssess1$Maximum, PropertyAssess2$Maximum, PropertyAssess3$Maximum, PropertyAssess4$Maximum)
PDtable <- as.data.frame(PDtable)
rownames(PDtable) <- c("Violent", "Property")
colnames(PDtable) <- paste("Factors =", 1:4)
kable(PDtable)
Factors = 1 Factors = 2 Factors = 3 Factors = 4
Violent -1422.399 -1408.117 -1415.552 -1548.975
Property -2083.176 -2135.292 -2200.640 -2420.407

From the table, we can find that the best models according to the evaluation method are 2-factor-model for violent crime and 1-factor-model for property crime datasets.

We present a few posterior distribution plots of the 4-factor models. The plots below are from the violent crime data.

oldpar <- par(mar = c(2,2,2,2))
plot_B.CI(ViolentDIFM4, permutation = Violent.permutation)

plot_X.CI(ViolentDIFM4)

par(oldpar)
plot_sigma2.CI(ViolentDIFM4, permutation = Violent.permutation)

plot_tau.CI(ViolentDIFM4)

plot_B.spatial(ViolentDIFM4, WestStates, layout.dim = c(2,2))

In the same manner, we present the results of the property crime data.

oldpar <- par(mar = c(2,2,2,2))
plot_B.CI(PropertyDIFM4, permutation = Property.permutation)

plot_X.CI(PropertyDIFM4)

par(oldpar)
plot_sigma2.CI(PropertyDIFM4, permutation = Property.permutation)

plot_tau.CI(PropertyDIFM4)

plot_B.spatial(PropertyDIFM4, WestStates, layout.dim = c(2,2))

Reference

Shin, H. and Ferreira, M. A. (2023). “Dynamic ICAR Spatiotemporal Factor Models.” Spatial Statistics, 56, 100763.

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models (2nd Ed.). Berlin, Heidelberg: Springer-Verlag.

Prado, R., Ferreira, M. A. R., and West, M. (2021). Time Series: Modeling, Computation, and Inference 2nd Ed. Boca Raton: Chapman & Hall/CRC.

Aguilar, O. and West, M. (2000). “Bayesian dynamic factor models and portfolio allocation.” Journal of Business and Economic Statistics, 18, 338–357.

Besag, J., York, J., and Mollie, A. (1991). “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics, 43, 1

Keefe, M. J., Ferreira, M. A. R., and Franck, C. T. (2018). “On the formal specification of sum-zero constrained intrinsic conditional autoregressive models.” Spatial Statistics, 24.