I’ve had a post on Kalman Filter intro last time. While that intro might be sufficient for some people, the derivation of it is often sought. So I’d like to cover that in this post.

I’d say there are two ways of deriving the Kalman filter: 1) using definition of mean and covariance, and 2) work out the math of hidden Markov model with a number of Gaussian distributions. In this post, I’ll take the first route.

## System Setup

Let us describe the linear system as follows: \begin{aligned} x_{k+1} &= F_k x_k + B_k u_k + w_k \\ z_k &= H_k x_k + v_k \end{aligned} where the x_k, u_k, z_k represent the state, control input, and the measurement at time k, respectively. The matrices F_k, B_k, H_k represent the state transition model, control model, and measurement model, respectively.

We model the processing and measurement noise, w_k, v_k, respectively, as a zero-mean white Gaussian : \begin{aligned} w_k &\sim \mathcal{N}(w_k; 0, Q_k) \\ v_k &\sim \mathcal{N}(v_k; 0, R_k)\end{aligned} where \begin{aligned}Q_k &= E[(w_k - E[w_k])(w_k - E[w_k])^T \\ R_k &= E[(v_k - E[v_k])(v_k - E[v_k])^T] \end{aligned}.

Let’s initialize the current state at time k: \begin{aligned}E[x_k|z_k] &= \mu_k \\ E[(x_k - E[x_k])(x_k - E[x_k])^T|z_k] &= E[(x_k - E[x_k|z_k])(x_k - E[x_k|z_k])^T] \\ &= \Sigma_k \end{aligned} .

## Prediction

Now, note that the expectation operator is a linear operator. So, if we take the operator of the predicted state: \begin{aligned} \bar{\mu}_{k+1} &= E[x_{k+1}|z_k] \\ &= E[F_k x_k + B_k u_k + w_k|z_k] \\ &= E[F_k x_k|z_k] + E[B_k u_k|z_k] + E[w_k|z_k] \\ &= F_k E[x_k|z_k] + B_k E[u_k|z_k] + E[w_k|z_k] \\ &= F_k \mu_k + B_k u_k \end{aligned} Note that at the end, the term E[w_k|z_k] disappears because the expectation of the processing noise is zero. To compute the predicted covariance: \begin{aligned} \bar{\Sigma}_{k+1} &= E[(x_{k+1}-E[x_{k+1}])(x_{k+1}-E[x_{k+1}])^T|z_k] \\ &= E[(F_k x_k + B_k u_k + w_k -E[F_k x_k + B_k u_k + w_k])(F_k x_k + B_k u_k + w_k -E[F_k x_k + B_k u_k + w_k])^T|z_k] \\ &= E[(F_k x_k - F_k E[x_k|z_k] + B_k u_k - B_k E[u_k|z_k] + w_k)(F_k x_k - F_k E[x_k|z_k] + B_k u_k - B_k E[u_k|z_k] + w_k)^T] \\ &= E[(F_k (x_k - E[x_k|z_k]) + B_k u_k - B_k u_k + w_k)(F_k (x_k - E[x_k|z_k]) + B_k u_k - B_k u_k + w_k)^T] \\ &= E[F_k[(x_k-E[x_k|z_k]) +w_k][w_k^T + (x_k-E[x_k|z_k])^T F_k^T] \\ &= F_k E[(x_k-E[x_k|z_k])w_k^T] + F_k E[(x_k-E[x_k|z_k])(x_k-E[x_k|z_k])^T]F_k^T + E[w_k w_k^T] + E[w_k (x_k-E[x_k|z_k])^T]F_k^T \\ &= F_k E[(x_k - E[x_k|z_k]) (x_k - E[x_k|z_k])^T] F_k^T + E[w_k w_k^T]\\ &= F_k \Sigma_k F_k^T +Q_k\end{aligned} Note that from the third equation from the bottom to the second, the cross terms between the state and the processing noises are dropped because the noise is zero mean white Gaussian.

## Measurement Update

### Conditional distribution of multivariate Gaussian distribution

Before we dive into the measurement update equations’ derivation, we need to know how to obtain the condition distribution (i.e. p(X|Z)) of a jointly Gaussian random vector. Suppose we have two random vectors X and Z which are jointly Gaussian (or multivariate Gaussian): \begin{aligned}\begin{bmatrix}X\\Z\end{bmatrix} \sim \mathcal{N} \left(\begin{bmatrix}X \\ Z\end{bmatrix}; \begin{bmatrix}\bar{X} \\ \bar{Z}\end{bmatrix}, \begin{bmatrix}P_{XX} & P_{XZ} \\ P_{ZX} & P_{ZZ}\end{bmatrix} \right) \end{aligned} where \begin{aligned}P_{XX} &= \mathrm{Cov}(X, X) = E[(X - E[X])(X - E[X])^T] \\ P_{ZZ} &= \mathrm{Cov}(Z, Z) = E[(Z - E[Z])(Z - E[Z])^T] \\ P_{XZ} &= \mathrm{Cov}(X, Z) = E[(X - E[X])(Z - E[Z])^T] = P_{ZX}^T\end{aligned}

The conditional distribution: p(X|Z) is computed by p(X|Z)=\frac{p(X,Z)}{p(Z)}. Once this calculation is carried out, you’ll end up with following equations: \begin{aligned}p(X|Z) &= \mathcal{N}(X; \hat{X}, \hat{P}_{XX})\end{aligned} where the mean and covariance are: \begin{aligned} E[X|Z] &= \hat{X} \\ &= \bar{X} + P_{XZ} P_{ZZ}^{-1}(Z-\bar{Z}) \\ \mathrm{Cov}(X,X|Z) &= \hat{P}_{XX} \\ &= E[(X - E[X])(X - E[X])^T|Z] \\ &= P_{XX} - P_{XZ} P_{ZZ}^{-1} P_{ZX}\end{aligned}

### Measurement Update Equations

After measurements are used to update current estimate, it provides the posterior distribution of p(x_{k+1}|z_{k+1}) . Using the equations above, the posterior mean and covariance are: \begin{aligned}E[x_{k+1}|z_{k+1}] &= E[x_{k+1}|z_k] + P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} (z_{k+1} - \bar{z}_{k+1}) \\ &= \bar{\mu}_{k+1}+ P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} (z_{k+1} - E[z_{k+1}|z_k]) \\ &= \bar{\mu}_{k+1} + P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} (z_{k+1} - E[H_{k+1}x_{k+1}+ v_{k+1}|z_k] \\ &= \bar{\mu}_{k+1} + P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} (z_{k+1} - H_{k+1}\bar{\mu}_{k+1}) \end{aligned}

Similarly, \begin{aligned}\mathrm{Cov}(x_{k+1}, x_{k+1}|z_k) &= P_{x_{k+1} x_{k+1}|z_k} - P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} P_{z_{k+1} x_{k+1}|z_k} \\ &= P_{x_{k+1} x_{k+1}|z_k} - P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} P_{z_{k+1} x_{k+1}|z_k} \end{aligned}

Let’s find the equations for those covariance terms: \begin{aligned} P_{x_{k+1} x_{k+1}|z_k} &= \mathrm{Cov}(x_{k+1},x_{k+1}|z_k) \\ &= \bar{\Sigma}_{k+1} \\ P_{x_{k+1} z_{k+1}|z_k} &= \mathrm{Cov}(x_{k+1}, z_{k+1}|z_k) \\ &= E[(x_{k+1}-E[x_{k+1}])(z_{k+1}-E[z_{k+1}])^T|z_k] \\ &= E[(x_{k+1}-E[x_{k+1}|z_k])(z_{k+1}-E[z_{k+1}|z_k])^T] \\ &= E[(x_{k+1}-\bar{\mu}_{k+1})(z_{k+1} - H_{k+1}E[x_{k+1}|z_k])^T] \\ &= E[(x_{k+1}-\bar{\mu}_{k+1})(H_{k+1}x_{k+1}-H_{k+1}\bar{\mu}_{k+1})^T] \\ &= E[(x_{k+1}-\bar{\mu}_{k+1})(H_{k+1}(x_{k+1}-\bar{\mu}_{k+1}))^T] \\ &= E[(x_{k+1}-\bar{\mu}_{k+1})(x_{k+1}-\bar{\mu}_{k+1})^T H_{k+1}^T] \\ &= E[(x_{k+1}-\bar{\mu}_{k+1})(x_{k+1}-\bar{\mu}_{k+1})^T] H_{k+1}^T \\ &= \bar{\Sigma}_{k+1} H_{k+1}^T \\ P_{z_{k+1} z_{k+1}|z_k} &= \mathrm{Cov}(z_{k+1}, z_{k+1}|z_k) \\ &= E[(z_{k+1}-E[z_{k+1}])(z_{k+1}-E[z_{k+1}])^T|z_k] \\ &= E[(z_{k+1}-E[z_{k+1}|z_k])(z_{k+1}-E[z_{k+1}|z_k])^T] \\ &= E[(H_{k+1}x_{k+1} + v_{k+1} -E[H_{k+1} x_{k+1} + v_{k+1}|z_k]) (H_{k+1}x_{k+1} + v_{k+1} -E[H_{k+1} x_{k+1} + v_{k+1}|z_k])^T] \\ &= E[(H_{k+1} x_{k+1} + v_{k+1} - E[H_{k+1}x_{k+1} + v_{k+1}|z_k])(H_{k+1} x_{k+1} + v_{k+1} - E[H_{k+1}x_{k+1} + v_{k+1}|z_k])]^T \\ &= E[(H_{k+1}x_{k+1} + v_{k+1} - H_{k+1}\bar{\mu}_{k+1}) (H_{k+1}x_{k+1} + v_{k+1} - H_{k+1}\bar{\mu}_{k+1})^T] \\ &= E[(H_{k+1}(x_{k+1}-\bar{\mu}_{k+1}) + v_{k+1})(H_{k+1}(x_{k+1}-\bar{\mu}_{k+1}) + v_{k+1}))^T] \\ &= H_{k+1} E[(x_{k+1}-\bar{\mu}_{k+1})(x_{k+1}-\bar{\mu}_{k+1})^T]H_{k+1}^T + H_{k+1}E[(x_{k+1}-\bar{\mu}_{k+1})v_{k+1}^T] \\ &\quad + E[v_{k+1} (x_{k+1}-\bar{\mu}_{k+1})^T]H_{k+1}^T + E[v_{k+1}v_{k+1}^T] \\ &= H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T + R_{k+1}\end{aligned}

Putting these together, we get Kalman filter measurement update equations: \begin{aligned} E[x_{k+1}|z_{k+1}] &= \bar{\mu}_{k+1} + K_{k+1}(z_{k+1} - H_{k+1}\bar{\mu}_{k+1}) \\ \mathrm{Cov}(x_{k+1},x_{k+1}|z_{k+1}) &= \bar{\Sigma}_{k+1} - K_{k+1} H_{k+1} \bar{\Sigma}_{k+1}^T \\ &= \bar{\Sigma}_{k+1} - K_{k+1} H_{k+1} \bar{\Sigma}_{k+1} \\ &= (I - K_{k+1}H_{k+1})\bar{\Sigma}_{k+1}\end{aligned} where \begin{aligned}K_{k+1} &= P_{x_{k+1} z_{k+1}|z_k} P_{z_{k+1} z_{k+1}|z_k}^{-1} \\ &= \bar{\Sigma}_{k+1} H_{k+1}^T (H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T + R_{k+1})\end{aligned}