In my last post, I’ve presented Kalman filter derivation in two different ways: 1) using expectation and 2) using probabilities. We know the linear Kalman filter is great when you have a conveniently ideal system which is linear and noises are correctly modeled with a white Gaussian distribution.
However, we know a lot of time our dynamics and/or measurement model is not a linear, but a non-linear system. And unfortunately, we cannot use the Kalman filter we talked about, but have to use the one designed for a non-linear system. It’s called extended Kalman filter (EKF). I’ll present the equations first, and then we can walk through the derivation 🙂
The non-linear system is described as follows:
where and .
The prediction is very similar to the Kalman filter. Here are the equations.
The matrix here is different from what we saw from the linear Kalman filter. It is the jacobian matrix of the prediction function . More formally,
where indicates the i-th component of the function , and indicates the i-th component of the vector .
Measurement update equation looks very similar to the linear Kalman filter as well.
Again, the matrix is not the measurement model we’ve seen in the linear Kalman filter. Similar to the in the prediction step, this is also the Jacobian matrix of the non-linear measurement model .
This section describes how the extended Kalman filter is derived. If you’re not interested, feel free to skip. The equations above are more than enough to get you started!
Before jumping into the derivation, we need to first know about how to linearize a non-linear function. Simply put, the extended Kalman filter is an extension of the linear Kalman filter to work with a nonlinear function. It does so by approximating the nonlinear function as a linear function. Of course, this is an approximation which means it’s not perfect. When the approximation is not sufficient, the mismatch between the approximated linear function and the non-linear function becomes large. And that is where the extended Kalman filter faces challenges. Nonetheless, let’s look at how a linearization is done.
We use Taylor series to linearize a function. Taylor series describes a function as an infinite sum of terms that uses derivatives at a single point (wikipedia). One can describe a function as a series:
And when you take a vector form:
You can see that you can write infinite number of higher-order terms after the second-order terms written above. You can also see that once a Taylor series is written, now the function becomes linear with respect to the error terms. Now, we can approximate the non-linear function as:
Note that a function can be linearized at any arbitrary point . It will still be a good approximation of the non-linear function around the point .
In order to apply to the extended Kalman filter derivation, we are representing it in terms of the estimation error, namely or where and both denote an estimate of the variable and is the true value. Note that the true value is not accessible to us because we can’t know due to the noises. However, we want the estimate to be as close as possible to the true value; thus minimizing or .
Given a posterior estimate of mean and covariance at time and now we want to have the estimates for the next time step. First, we linearize about the current estimate.
We can also linearize the measurement model:
Looking back at the measurement update part of the previous Kalman filter derivation, we need cross covariance terms. Let’s get them now.
Now, let’s get the posterior mean and covariance: