In my last post, I’ve talked about Kalman Filter Derivation using expectation operator and mentioned that we can also derive Kalman filter using hidden Markov model (HMM) with Gaussian distributions. So in this post, I’ll cover that derivation.
Kalman Filter in Hidden Markov Model
A Markov model “describes a sequence of possible events in which the probability of each event depends only on the state attained in the previous event” (source: wikipedia). A hidden Markov model is a Markov model where the state is unobserved.
Above figure shows a probabilistic graphical model of the HMM used in Kalman filter. It’s quite simple to look at. A filter is initialized at time 0, . As the time progresses, the time index increases and measurements become available. As mentioned in the previous two posts, Kalman filter takes two steps: prediction and measurement update. The prediction is propagating the current state to the next time step. From time 0 to 1, we compute . Then, we do measurement update and get . Then again predict for the next time step and compute , and do measurement update to compute . Now, it is important to note that the posterior state at time 2, , is computed given both measurements at time 1 and 2. Note that represent all measurements from time to , .
This could become a big issue because as the time increases, the number of measurements to keep track increases linearly, and computing the posterior distribution of will require increasing number of computation as time increases. Fortunately, this is where the Makov model helps. Markov model lets us describe our state at each time step perfectly given only the previous state. That is, we can compute or without fusing all the measurements up to time , , if we know the state . Let’s look at the math little bit here.
Note that is due to the Markov assumption where all measurements up to become independent with when given . If you’re familiar with the directed graph in probabilistic graphical model, you’ll notice that conditional independence from the above diagram as well.
Given the Markov assumption, we can derive recursive formulation of the Kalman filter.
where . So, now we can express the posterior distribution at time , , as a function of the posterior distribution at the previous time step, , recursively.
Plug in Gaussian distributions
With the recursive formulation obtained above, let’s define the system model using Gaussian distributions.
where and represent processing and measurement noise covariance matrices, respectively. Note that the notation represents a Gaussian distribution of a random variable whose mean is and covariance is .
Let’s also define the current posterior distribution at time :
Let’s substitute the distributions in the recursive formulation to obtain the posterior distribution at the next time step, :
If we only take the prediction terms denoted above,
First, let’s take the exponential terms, and sort them a bit. It is important to separate them as a function of and .
Let’s complete the square for the latter half of above equation:
Phew – that was a lot of equations. Hopefully it’s clear for you to follow! Now the last equation is separated by terms in and by the first two lines and the latter two lines. Let’s look at the last two lines above. That just looks like an exponential term of a multivariate Gaussian distribution!
Note that the original prediction equation involves an integral with , and that an integral over a Gaussian distribution equals to one. Thus:
where is a constant. Note that because those exponential terms only doesn’t represent a Gaussian because it’s missing the constant part. Therefore, is that normalization constant we usually see in front of the exponential terms.
Now with the integral carried out, we have only terms in . And they’d better make a Gaussian distribution now. Now, let’s take the rest of the terms.
Let’s take the first two lines of the last equation. These are the only terms that has in them. The rest are now just constants since we know all those values. Since those two terms serve the first two terms of the Gaussian, we need to complete the square here again.
Okay – Now if we take the first four lines of the above equation, we can a Gaussian. Note that those terms left are all constant. So we get,
Now we have evaluated that the prediction becomes a Gaussian distribution with mean of and its covariance . Then, the posterior distribution becomes
Taking the exponential terms only, we get:
Notice that the last two terms are constants. Now completing the square (again):
Then we get the posterior mean and covariance now:
Phew.. it took a long time to write this post because I needed to check every line of the equations. Usually people do not take this approach because the other approach using expectation operator is way easier. But in case you ever wondered how product of Gaussians and integration/marginalization works out with probabilities, you know it now. The trick is to complete the square and not make mistakes while writing out all these terms. Hope this is helpful.