Kalman Filter Derivation – HMM

In my last post, I’ve talked about Kalman Filter Derivation using expectation operator and mentioned that we can also derive Kalman filter using hidden Markov model (HMM) with Gaussian distributions. So in this post, I’ll cover that derivation.

Kalman Filter in Hidden Markov Model

A Markov model “describes a sequence of possible events in which the probability of each event depends only on the state attained in the previous event” (source: wikipedia). A hidden Markov model is a Markov model where the state is unobserved.

Above figure shows a probabilistic graphical model of the HMM used in Kalman filter. It’s quite simple to look at. A filter is initialized at time 0, p(x_0). As the time progresses, the time index increases and measurements become available. As mentioned in the previous two posts, Kalman filter takes two steps: prediction and measurement update. The prediction is propagating the current state to the next time step. From time 0 to 1, we compute p(x_1). Then, we do measurement update and get p(x_1|z_1). Then again predict for the next time step and compute p(x_2|z_1), and do measurement update to compute p(x_2|z_{1:2}). Now, it is important to note that the posterior state at time 2, p(x_2|z_{1:2}), is computed given both measurements at time 1 and 2. Note that z_{a:b} represent all measurements from time a to b, z_{a:b}=z_a, z_{a+1},\cdots, z_{b-1}, z_b.

This could become a big issue because as the time increases, the number of measurements to keep track increases linearly, and computing the posterior distribution of p(x_k|z_{1:k}) will require increasing number of computation as time k increases. Fortunately, this is where the Makov model helps. Markov model lets us describe our state at each time step perfectly given only the previous state. That is, we can compute p(x_k|z_{1:k}) or p(x_k|z_{1:k-1}) without fusing all the measurements up to time k-1, z_{1:k-1}, if we know the state p(x_{k-1}). Let’s look at the math little bit here. 

    \begin{equation*} \begin{aligned} p(x_k|z_{1:k-1}) &= \int p(x_k, x_{k-1}|z_{1:k-1}) dx_{k-1} \\ &= \int p(x_k|x_{k-1},z_{1:k-1}) \cdot p(x_{k-1}|z_{1:k-1}) dx_{k-1} \\ &= \int p(x_k|x_{k-1}) \cdot p(x_{k-1}|z_{1:k-1}) dx_{k-1} \end{aligned} \end{equation*}

Note that p(x_k|x_{k-1},z_{1:k-1}) = p(x_k|x_{k-1}) is due to the Markov assumption where all measurements up to k-1 become independent with x_k when given x_{k-1}. If you’re familiar with the directed graph in probabilistic graphical model, you’ll notice that conditional independence from the above diagram as well.

Recursive Formulation

Given the Markov assumption, we can derive recursive formulation of the Kalman filter.

    \begin{equation*} \begin{aligned} p(x_k|z_{1:k}) &= p(x_k|z_{1:k-1}, z_k) \\ &= \frac{p(z_k|x_k, z_{1:k-1})\cdot p(x_k|z_{1:k-1})}{p(z_k|z_{1:k-1})} & \text{(Bayes rule)} \\ &= \frac{p(z_k|x_k)\cdot p(x_k|z_{1:k-1})}{p(z_k|z_{1:k-1})} & \text{(Markov assumption)} \\ &= \frac{1}{\eta_k} \cdot p(z_k|x_k) \int p(x_k,x_{k-1}|z_{1:k-1}) dx_{k-1} \\ &= \frac{1}{\eta_k} \cdot p(z_k|x_k) \int p(x_k|x_{k-1}) \cdot p(x_{k-1}|z_{1:k-1}) dx_{k-1} \end{aligned} \end{equation*}

where \eta_k = p(z_k|z_{1:k-1}). So, now we can express the posterior distribution at time k, p(x_k|z_{1:k}), as a function of the posterior distribution at the previous time step, p(x_{k-1}|z_{1:k-1}), recursively.

Plug in Gaussian distributions

With the recursive formulation obtained above, let’s define the system model using Gaussian distributions. 

    \begin{equation*} \begin{aligned} p(x_{k+1}|x_k) &= \mathcal{N}(x_{k+1};F_k x_k + B_k u, Q_k)  \\ p(z_k|x_k) &= \mathcal{N}(z_k; H_k x_k, R_k) \end{aligned} \end{equation*}

where Q_k = E[w_k w_k^T] and R_k=E[v_k v_k^T] represent processing and measurement noise covariance matrices, respectively. Note that the notation \mathcal{N}(x; a, b) represents a Gaussian distribution of a random variable x whose mean is a and covariance is b.

Let’s also define the current posterior distribution at time k:

    \[ p(x_k|z_{1:k}) = \mathcal{N}(x_k; \hat{\mu}_k, \hat{\Sigma}_k) \]

Let’s substitute the distributions in the recursive formulation to obtain the posterior distribution at the next time step, k+1:

    \[ p(x_{k+1}|z_{1:k+1}) =\frac{1}{\eta_{k+1}} \cdot \underbrace{\mathcal{N}(z_{k+1}; H_k x_{k+1}, R_{k+1})}_\text{measurement update} \underbrace{\int \mathcal{N}(x_{k+1}; F_k x_k + B_k u_k, Q_k)\cdot \mathcal{N}(x_k; \hat{\mu}_k, \hat{\Sigma}_k) dx_k}_\text{prediction} \]

Prediction

If we only take the prediction terms denoted above,

    \begin{equation*} \begin{aligned} p(x_{k+1}|z_{1:k}) &= \int \mathcal{N}(x_{k+1}; F_k x_k + B_k u_k, Q_k)\cdot \mathcal{N}(x_k; \hat{\mu}_k, \hat{\Sigma}_k) dx_k \\ &= \frac{1}{\zeta} \int \mathrm{exp}\left(-\frac{1}{2}(x_{k+1} - (F_k x_k + B_k u_k))^TQ_k^{-1}(x_{k+1} - (F_k x_k + B_k u_k))\right) \cdot \mathrm{exp}\left(-\frac{1}{2}(x_k - \hat{\mu}_k)^T\hat{\Sigma}_k^{-1}(x_k - \hat{\mu}_k)\right) dx_k \\ &= \frac{1}{\zeta} \int \mathrm{exp}\left(-\frac{1}{2}((x_{k+1} - (F_k x_k + B_k u_k))^TQ_k^{-1}(x_{k+1} - (F_k x_k + B_k u_k)) + (x_k - \hat{\mu}_k)^T\hat{\Sigma}_k^{-1}(x_k - \hat{\mu}_k))\right)dx_k \end{aligned} \end{equation*}

First, let’s take the exponential terms, and sort them a bit. It is important to separate them as a function of x_{t+1} and x_t.

    \begin{equation*} \begin{flalign} &(x_{k+1} - (F_k x_k + B_k u_k))^TQ_k^{-1}(x_{k+1} - (F_k x_k + B_k u_k)) + (x_k - \hat{\mu}_k)^T\hat{\Sigma}_k^{-1}(x_k - \hat{\mu}_k)) \\ &= x_{k+1}^TQ_k^{-1}x_{k+1} - 2 x_{k+1}^T Q_k^{-1}(F_k x_k+B_k u_k) + (F_k x_k + B_k u_k)^T Q_k^{-1} (F_k x_k + B_k u_k) + x_k^T \hat{\Sigma}_k^{-1} x_k - 2 \hat{\mu}_k^T \Sigma_k^{-1} x_k + \hat{\mu}_k^T\hat{\Sigma}_k^{-1}\hat{\mu}_k \\ &= x_{k+1}^TQ_k^{-1}x_{k+1} - 2 x_{k+1}^T Q_k^{-1} F_k x_k - 2 x_{k+1}^T Q_k^{-1}B_k u_k + x_k^T F_k^T Q_k^{-1} F_k x_k + 2 u_k^T B_k^T Q_k^{-1} F_k x_k + u_k^T B_k^T Q_k^{-1} B_k u_k  \\ &\quad\quad + x_k^T \hat{\Sigma}_k^{-1} x_k - 2 \hat{\mu}_k^T \hat{\Sigma}_k^{-1} x_k + \hat{\mu}_k^T\hat{\Sigma}_k^{-1}\hat{\mu}_k \\ &= x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k\\ &\quad\quad + \left( x_k^T( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) x_k - 2 (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})  x_k \right) \end{flalign}\end{equation*}

Let’s complete the square for the latter half of above equation:

    \begin{equation*}\begin{flalign} &= x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k \\ &\quad\quad+ \left( x_k^T( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) x_k - 2 (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})  x_k \right. \\ & \quad\quad\quad\quad +\left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1} ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) \right. \\ & \quad\quad\quad\quad\quad\quad \left. \cdot ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\ &\quad\quad\quad\quad - \left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1} ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) \right. \\ & \quad\quad\quad\quad\quad\quad \left. \cdot ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T\right) \\ &= x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k \\ &\quad\quad+ \left( x_k^T( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) x_k - 2 (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})  x_k \right. \\ & \quad\quad\quad\quad +\left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\ & \quad\quad\quad\quad -\left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T\right) \\ &= x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k \\ & \quad\quad - (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\ &\quad + \left( x_k^T( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) x_k - 2 (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})  x_k \right. \\ & \quad\quad\quad\quad +\left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T\right) \\ \end{flalign} \end{equation*}

Phew – that was a lot of equations. Hopefully it’s clear for you to follow! Now the last equation is separated by terms in x_{k+1} and x_k by the first two lines and the latter two lines. Let’s look at the last two lines above. That just looks like an exponential term of a multivariate Gaussian distribution!

Note that the original prediction equation involves an integral with x_t, and that an integral over a Gaussian distribution equals to one. Thus:

(1)   \begin{multline*} \int \mathrm{exp}\left(x_k^T( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k ) x_k - 2 (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})  x_k \\ & \quad\quad\quad\quad +\left. (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T\right) dx_k = c \end{multline*}

where c is a constant. Note that c \ne 1 because those exponential terms only doesn’t represent a Gaussian because it’s missing the constant part. Therefore, c is that normalization constant we usually see in front of the exponential terms. 

Now with the integral carried out, we have only terms in x_{k+1}. And they’d better make a Gaussian distribution now. Now, let’s take the rest of the terms.

    \begin{equation*} \begin{flalign} & x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k \\ & \quad\quad - (x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) ( \hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k )^{-1}(x_{k+1}^T Q_k^{-1} F_k - u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\ &= x_{k+1}^T Q_k^{-1} x_{k+1} - 2 x_{k+1}^T Q_k^{-1}B_k u_k + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k + u_k^T B_k^T Q_k^{-1} B_k u_k \\ & \quad\quad - x_{k+1}^T Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1} x_{k+1} - 2 (-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})(\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1} x_{k+1} \\ &\quad\quad\quad\quad - (-u_k^T B_k^T Q_k^{-1}F_k +\hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} (-u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\ &=x_{k+1}^T (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}) x_{k+1}  \\ &\quad\quad- 2 \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right) x_{k+1} \\ &\quad\quad\quad\quad - (-u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} (-u_k^T B_k^T Q_k^{-1}F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1})^T \\  & \quad\quad\quad\quad\quad\quad + \mu_k^T \hat{\Sigma}_k^{-1} \hat{\mu}_k - u_k^T B_k^T Q_k^{-1} B_k u_k \end{flalign} \end{equation*}

Let’s take the first two lines of the last equation. These are the only terms that has x_{k+1} in them. The rest are now just constants since we know all those values. Since those two terms serve the first two terms of the Gaussian, we need to complete the square here again.

    \[(x-a)^TB(x-a) = x^T B x - 2 a^T B x + a^T B a\]

Therefore,

    \begin{equation*} \begin{flalign} & x_{k+1}^T (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}) x_{k+1}\\ &\quad\quad  - 2 \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right) x_{k+1} \\ &= x_{k+1}^T (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}) x_{k+1}\\ &\quad\quad - 2 (-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1} x_{k+1} \\ &\quad\quad\quad\quad + \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right) (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1})  \\ &\quad\quad\quad\quad\quad\cdot \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right)^T \\ &\quad\quad\quad\quad - \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right)(Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1})  \\ &\quad\quad\quad\quad\quad\cdot \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right)^T \end{flalign} \end{equation*}

Okay – Now if we take the first four lines of the above equation, we can a Gaussian. Note that those terms left are all constant. So we get,

    \[p(x_{k+1}|z_{1:k}) = \mathcal{N}(x_{k+1}; \bar{\mu}_{k+1},\bar{\Sigma}_{k+1})\]

where

    \begin{equation*} \begin{align*} \bar{\Sigma}_{k+1} &= (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1})^{-1} \\  &= \boxed{F_k \hat{\Sigma}_k F_k^T + Q_k} \\ \bar{\mu} &= (Q_k^{-1} - Q_k^{-1} F_k (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1})^{-1} \\ &\quad\quad \left((-u_k^T B_k^T Q_k^{-1} F_k + \hat{\mu}_k^T \hat{\Sigma}_k^{-1}) (\hat{\Sigma}_k^{-1} + F_k^T Q_k^{-1} F_k)^{-1} F_k^T Q_k^{-1}  + u_k^T B_k^T Q_k^{-1}\right)^T \\  &= \boxed{F_k \hat{\mu}_k + B_k u_k} \end{align} \end{equation*}

Measurement Update

Now we have evaluated that the prediction becomes a Gaussian distribution with mean of \bar{\mu}_{k+1} and its covariance \bar{\Sigma}_{k+1}. Then, the posterior distribution becomes 

    \[p(x_{k+1}|z_{1:k+1}) = \frac{1}{\eta_{k+1}} \mathcal{N}(z_{k+1};H_{k+1} x_{k+1}, R_{k+1}) \cdot \mathcal{N}(x_{k+1}; \bar{\mu}_{k+1}, \bar{\Sigma}_{k+1})\]

Taking the exponential terms only, we get:

    \begin{equation*} \begin{flalign} &(z_{k+1} - H_{k+1} x_{k+1})^T R_{k+1}^{-1} (z_{k+1} - H_{k+1} x_{k+1}) + (x_{k+1}-\bar{\mu}_{k+1})^T \bar{\Sigma}_{k+1}^{-1} (x_{k+1}-\bar{\mu}_{k+1}) \\ &= z_{k+1}^T R_{k+1}^{-1} z_{k+1} - 2 z_{k+1}^T R_{k+1}^{-1} H_{k+1} x_{k+1} + x_{k+1}^T H_{k+1}^T R_{k+1}^{-1} H_{k+1} x_{k+1}  \\ & \quad\quad + x_{k+1}^T \bar{\Sigma}_{k+1}^{-1} x_{k+1} - 2 \bar{\mu}_{k+1}^T  \bar{\Sigma}_{k+1}^{-1} x_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1} \bar{\mu} \\ &= x_{k+1}^T (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) x_{k+1} \\ &\quad\quad -2 (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1}) x_{k+1} +\underbrace{z_{k+1}^T R_{k+1}^{-1} z_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1} \bar{\mu}}_{\text{constants}} \end{flalign} \end{equation*}

Notice that the last two terms are constants. Now completing the square (again):

    \begin{equation*} \begin{flalign} & x_{k+1}^T (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) x_{k+1} \\ &\quad\quad -2 (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1}) x_{k+1} + \text{constants} \\  &= x_{k+1}^T (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) x_{k+1}  \\ &\quad\quad -2 (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1}) x_{k+1} \\ &\quad\quad+ (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} \\ &\quad\quad\quad\quad \cdot (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T \\ &\quad\quad - (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} \\ &\quad\quad\quad\quad \cdot (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T + \text{constants} \\ &= x_{k+1}^T (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) x_{k+1}  \\ &\quad\quad -2 (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1}) x_{k+1} \\ &\quad\quad+ (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T \\ &\quad\quad - (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T  + \text{constants} \\ &= x_{k+1}^T (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1}) x_{k+1}  \\ &\quad\quad -2 (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1}) x_{k+1} \\ &\quad\quad+ (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}) (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T  + \text{constants} \\ \end{flalign} \end{equation*}

Then we get the posterior mean and covariance now:

    \begin{equation*} \begin{align} p(x_{k+1}|z_{1:k+1}) &= \mathcal{N}(x_{k+1}; \hat{\mu}_{k+1}, \hat{\Sigma}_{k+1}) \\ \\ \hat{\Sigma}_{k+1} &= (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} \\ &= \bar{\Sigma}_{k+1} - \bar{\Sigma}_{k+1} H_{k+1}^T(R_{k+1} + H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T)^{-1} H_{k+1}\bar{\Sigma}_{k+1} \\ &= (I - \bar{\Sigma}_{k+1} H_{k+1}^T(R_{k+1} + H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T)^{-1} H_{k+1})\bar{\Sigma}_{k+1} \\ &= \boxed{(I - K_{k+1} H_{k+1})\bar{\Sigma}_{k+1}} \\ \hat{\mu}_{k+1} &= (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (z_{k+1}^T R_{k+1}^{-1} H_{k+1} + \bar{\mu}_{k+1}^T \bar{\Sigma}_{k+1}^{-1})^T \\ &= (\bar{\Sigma}_{k+1}^{-1} + H_{k+1}^T R_{k+1}^{-1} H_{k+1})^{-1} (H_{k+1}^T R_{k+1}^{-1} z_{k+1} + \bar{\Sigma}_{k+1}^{-1} \bar{\mu}_{k+1}) \\ &= \boxed{\bar{\mu}_{k+1} + K_{k+1} (z_{k+1} - H_{k+1} \bar{\mu}_{k+1})} \\ \\ \text{where}\quad K_{k+1} &= \bar{\Sigma}_{k+1} H_{k+1}^T S_{k+1}^{-1} \\ S_{k+1} &= R_{k+1} + H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T \end{align} \end{equation*}

Phew.. it took a long time to write this post because I needed to check every line of the equations. Usually people do not take this approach because the other approach using expectation operator is way easier. But in case you ever wondered how product of Gaussians and integration/marginalization works out with probabilities, you know it now. The trick is to complete the square and not make mistakes while writing out all these terms. Hope this is helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *