Extended Kalman Filter

In my last post, I’ve presented Kalman filter derivation in two different ways: 1) using expectation and 2) using probabilities. We know the linear Kalman filter is great when you have a conveniently ideal system which is linear and noises are correctly modeled with a white Gaussian distribution.

However, we know a lot of time our dynamics and/or measurement model is not a linear, but a non-linear system. And unfortunately, we cannot use the Kalman filter we talked about, but have to use the one designed for a non-linear system. It’s called extended Kalman filter (EKF). I’ll present the equations first, and then we can walk through the derivation 🙂

System Setup

The non-linear system is described as follows:

    \begin{equation*} \begin{align}x_{k+1} &= f(x_k, u_k) + w_k \\z_{k+1} &= h(x_{k+1}) + v_k \\ \\w_k & \sim \mathcal{N}(w_k; 0, Q_k) \\v_k & \sim \mathcal{N}(v_k; 0, R_k)\end{align} \end{equation*}


where Q_k = E[w_k w_k^T] and R_k = E[v_k v_k^T].

Prediction

The prediction is very similar to the Kalman filter. Here are the equations.

    \begin{equation*}\begin{align}\bar{\mu}_{k+1} &= f(\hat{\mu}_k, u_k) \\\bar{\Sigma}_{k+1} &= F_k \hat{\Sigma}_k F_k^T + Q_k\end{align}\end{equation*}

The F_k matrix here is different from what we saw from the linear Kalman filter. It is the jacobian matrix of the prediction function f(x, u). More formally,

    \[F = \frac{\partial}{\partial x} f(x, u) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots &  \frac{\partial f_1}{\partial x_{n_x}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_{n_x}}{\partial x_1} & \cdots &  \frac{\partial f_{n_x}}{\partial x_{n_x}} \end{bmatrix}\]

where f_i indicates the i-th component of the function f, and x_i indicates the i-th component of the vector x.

Measurement Update

Measurement update equation looks very similar to the linear Kalman filter as well.

    \begin{equation*}\begin{align}\tilde{z}_{k+1} &= z_{k+1} - h(\bar{\mu}_{k+1}) \\S_{k+1} &= H_{k+1} \bar{\Sigma}_{k+1} H_{k+1}^T + R \\K_{k+1} &= \bar{\Sigma}_{k+1} H_{k+1}^T S_{k+1}^{-1} \\\hat{\mu}_{k+1} &= \bar{\mu}_{k+1} + K_{k+1} \tilde{z}_{k+1} \\\hat{\Sigma}_{k+1} &= (I - K_{k+1} H_{k+1}) \bar{\Sigma}_{k+1} \end{align}\end{equation*}

Again, the H_{k+1} matrix is not the measurement model we’ve seen in the linear Kalman filter. Similar to the F_k in the prediction step, this is also the Jacobian matrix of the non-linear measurement model h(x)

    \[H = \frac{\partial}{\partial x} h(x) = \begin{bmatrix} \frac{\partial h_1}{\partial x_1} & \cdots &  \frac{\partial h_1}{\partial x_{n_x}} \\ \vdots & \ddots & \vdots \\ \frac{\partial h_{n_x}}{\partial x_1} & \cdots &  \frac{\partial h_{n_x}}{\partial x_{n_x}} \end{bmatrix}\]

Derivation

This section describes how the extended Kalman filter is derived. If you’re not interested, feel free to skip. The equations above are more than enough to get you started!

Linearization

Before jumping into the derivation, we need to first know about how to linearize a non-linear function. Simply put, the extended Kalman filter is an extension of the linear Kalman filter to work with a nonlinear function. It does so by approximating the nonlinear function as a linear function. Of course, this is an approximation which means it’s not perfect. When the approximation is not sufficient, the mismatch between the approximated linear function and the non-linear function becomes large. And that is where the extended Kalman filter faces challenges. Nonetheless, let’s look at how a linearization is done.

We use Taylor series to linearize a function. Taylor series describes a function as an infinite sum of terms that uses derivatives at a single point (wikipedia). One can describe a function f as a series:

    \[f = f(a) + \frac{1}{1!} \left.\frac{df}{dx}\right|_{x=a}(x-a) + \frac{1}{2!} \left.\frac{d^2 f}{dx^2}\right|_{x=a}(x-a)^2 + \cdots\]

And when you take a vector form:

    \[f = f(\mathbf{a}) + \frac{1}{1!}\left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x}-\mathbf{a}) + \frac{1}{2!}\left.\frac{\partial^2 f}{\partial \mathbf{x}^2}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x}-\mathbf{a})^2 + \cdots\]

You can see that you can write infinite number of higher-order terms after the second-order terms written above. You can also see that once a Taylor series is written, now the function becomes linear with respect to the error terms. Now, we can approximate the non-linear function f as:

    \begin{equation*}\begin{align}f &\approx f(\mathbf{a}) + \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x}-\mathbf{a}) \\&= f(\mathbf{a}) + F|_{\mathbf{x}=\mathbf{a}} (\mathbf{x}-\mathbf{a})\end{align}\end{equation*}

Note that a function f can be linearized at any arbitrary point \mathbf{a}. It will still be a good approximation of the non-linear function f around the point \mathbf{a}

In order to apply to the extended Kalman filter derivation, we are representing it in terms of the estimation error, namely x - \hat{x} or x - \bar{x} where \hat{x} and \bar{x} both denote an estimate of the variable x and x is the true value. Note that the true value x is not accessible to us because we can’t know due to the noises. However, we want the estimate to be as close as possible to the true value; thus minimizing x - \hat{x} or x - \bar{x}.

Prediction

Given a posterior estimate of mean and covariance at time k and now we want to have the estimates for the next time step. First, we linearize about the current estimate.

    \begin{equation*}\begin{align}x_{k+1} &= f(x_k, u_k) + w_k \\&\approx f(\hat{\mu}_k, u_k) + \left.\frac{\partial f}{\partial x_k}\right|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + \left.\frac{\partial f}{\partial u_k}\right|_{u_k=u_k} (u_k - u_k)+ w_k \\&= f(\hat{\mu}_k, u_k) + \left.\frac{\partial f}{\partial x_k}\right|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k \\&= f(\hat{\mu}_k, u_k) + F_k|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k \\&= f(\hat{\mu}_k, u_k) + F_k|_{x_k=\hat{\mu}_k} \delta \hat{x}_k + w_k\end{align}\end{equation*}

Therefore,

    \begin{equation*}\begin{align}E[x_{k+1}] &= \bar{\mu}_{k+1} \\&= E[f(\hat{\mu}_k, u_k)] + F_k|_{x_k=\hat{\mu}_k} E[(x_k - \hat{\mu}_k)] + E[w_k] \\&= \boxed{f(\hat{\mu}_k, u_k)}\end{align}\end{equation*}

    \begin{equation*}\begin{flalign}& E[(x_{k+1}-E[x_{k+1}])(x_{k+1}-E[x_{k+1}])^T] \\&= E[\left(f(\hat{\mu}_k, u_k) + F_k|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k - f(\hat{\mu}_k, u_k)\right) \\ &\quad\quad\quad\quad\quad\quad\left(f(\hat{\mu}_k, u_k) + F_k|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k - f(\hat{\mu}_k, u_k)\right)^T] \\&= E[(F_k|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k)(F_k|_{x_k=\hat{\mu}_k} (x_k - \hat{\mu}_k) + w_k)^T] \\&= E[(F_k|_{x_k=\hat{\mu}_k} \delta \hat{x}_k + w_k)(F_k|_{x_k=\hat{\mu}_k} \delta \hat{x}_k + w_k)^T] \\&= F_k|_{x_k=\hat{\mu}_k} E[\delta \hat{x}_k {\delta \hat{x}_k}^T] F_k|_{x_k=\hat{\mu}_k}^T + E[w_k w_k^T] \\&= F_k|_{x_k=\hat{\mu}_k} E[\delta \hat{x}_k {\delta \hat{x}_k}^T] F_k|_{x_k=\hat{\mu}_k}^T + Q_k \\&= \boxed{F_k|_{x_k=\hat{\mu}_k} \hat{\Sigma}_k F_k|_{x_k=\hat{\mu}_k}^T + Q_k} = \bar{\Sigma}_{k+1}\end{flalign}\end{equation*}

Measurement Update

We can also linearize the measurement model:

    \begin{equation*}\begin{align}z_{k+1} &= h(x_{k+1}) + v_{k+1} \\&\approx h(\bar{\mu}_{k+1}) + \left.\frac{\partial h}{\partial x}\right|_{x_{k+1}=\bar{\mu}_{k+1}} (x_{k+1}-\bar{\mu}_{k+1}) +v_{k+1}\\&= h(\bar{\mu}_{k+1}) + H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \delta \bar{x}_{k+1} +v_{k+1} \\ \\E[z_{k+1}] &= E[h(\bar{\mu}_{k+1}) + H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \delta \bar{x}_{k+1} +v_{k+1}] \\&= \boxed{h(\bar{\mu}_{k+1})}\end{align}\end{equation*}

Looking back at the measurement update part of the previous Kalman filter derivation, we need cross covariance terms. Let’s get them now.

    \begin{equation*}\begin{align}P_{x_{k+1}z_{k+1}} &= E[(x_{k+1} - E[x_{k+1}])(z_{k+1} - E[z_{k+1}])^T] \\&= E[(x_{k+1}-\bar{\mu}_{k+1})(H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}(x_{k+1}-\bar{\mu}_{k+1}) + v_{k+1})^T] \\&= E[(x_{k+1}-\bar{\mu}_{k+1})((x_{k+1}-\bar{\mu}_{k+1})^T H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + v_{k+1}^T)]  \\&= \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T \\ \\P_{z_{k+1}z_{k+1}} &= E[(z_{k+1}-E[z_{k+1}])(z_{k+1}-E[z_{k+1}])^T] \\&= E[(H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}(x_{k+1} - \bar{\mu}_{k+1}) + v_{k+1})(H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}(x_{k+1} - \bar{\mu}_{k+1}) + v_{k+1})^T] \\&= E[(H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \delta \bar{x}_{k+1} + v_{k+1})(H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \delta \bar{x}_{k+1} + v_{k+1})^T] \\&= H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} E[\delta \bar{x}_{k+1} {\delta \bar{x}_{k+1}}^T] H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + E[v_{k+1} v_{k+1}^T] \\&= H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + R_{k+1}\end{align}\end{equation*}

Now, let’s get the posterior mean and covariance:

    \begin{equation*}\begin{align}\hat{\mu}_{k+1} &= \bar{\mu}_{k+1} + P_{x_{k+1}z_{k+1}} P_{z_{k+1}z_{k+1}}^{-1}(z_{k+1} - E[z_{k+1}]) \\&= \bar{\mu}_{k+1} + \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T \\ &\quad\quad\quad\quad \cdot (H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + R_{k+1})^{-1}\left(z_{k+1} - h(\bar{\mu}_{k+1})\right) \\&= \boxed{\bar{\mu}_{k+1} + K_{k+1}\left(z_{k+1} - h(\bar{\mu}_{k+1})\right)} \\ \\\hat{\Sigma}_{k+1} &= \bar{\Sigma}_{k+1} - P_{x_{k+1}z_{k+1}} P_{z_{k+1}z_{k+1}}^{-1} P_{x_{k+1}z_{k+1}}^T \\&= \bar{\Sigma}_{k+1} - K_{k+1} P_{x_{k+1}z_{k+1}}^T \\&= \bar{\Sigma}_{k+1} - K_{k+1}  H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1}^T \\&= \bar{\Sigma}_{k+1} - K_{k+1}  H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1} \\ &= \boxed{(I - K_{k+1}  H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}) \bar{\Sigma}_{k+1}}\\K_{k+1} &= \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T (H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + R_{k+1})^{-1} \\&= \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T  S_{k+1}^{-1} \\S_{k+1} &= H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}} \bar{\Sigma}_{k+1} H_{k+1}|_{x_{k+1}=\bar{\mu}_{k+1}}^T + R_{k+1}\end{align}\end{equation*}

Leave a Reply

Your email address will not be published. Required fields are marked *