Linearization Error

In my last post in the extended Kalman filter derivation, I’ve talked about the linearization step. I have briefly talked about how the linearization step occurs so that the linear Kalman filter could be used to approximate the nonlinearity. I’ve felt that it might help little bit to demonstrate what the linearization error could do, and how a bad linearization point could affect our estimates.

Taylor Series

So, we’ve talked about that the linearization is done using Taylor series. I’m pretty sure that there are many other series expansions in mathematics, but I’m pretty positive that Taylor series is one of the most popular one. Again, Taylor series expansion looks like this: 

    \[f(\mathbf{x}) = f(\mathbf{a}) + \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x} -\mathbf{a}) + \frac{1}{2!}\left.\frac{\partial^2 f}{\partial \mathbf{x}^2}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x} - \mathbf{a})^2 +\frac{1}{3!}\left.\frac{\partial^3 f}{\partial \mathbf{x}^3}\right|_{\mathbf{x}=\mathbf{a}} (\mathbf{x} - \mathbf{a})^3 +\cdots\]

Well, let’s first see if this holds true, and it should. Here’s an example here:

    \begin{equation*} \begin{align}f(x) &= 3x^3 + 2x^2 - 2x + 7 - 3x^{-1}\end{align}\end{equation*}

Since I cannot take the infinite number of derivatives myself, I took a liberty of taking only up to the 4th order terms.

    \begin{equation*}\begin{align}\frac{df}{dx} &= 9x^2 + 4x - 2 + 3x^{-2} \\\frac{d^2f}{dx^2} &= 18x + 4 - 6x^{-3} \\\frac{d^3f}{dx^3} &= 18 + 18x^{-4} \\\frac{d^4f}{dx^4} &= -72x^{-5} \\\end{align} \end{equation*}

Let’s evaluate this function at 2.0 as well as take the Taylor series about a=1.0:

    \begin{equation*}\begin{align}f(2.0) &= 3 \cdot 8 + 2 \cdot 4 - 2 \cdot 2 + 7 - 3 \cdot 0.5 = 33.5 \\f(2.0) &= f(1.0) + \left.\frac{df}{dx}\right|_{x=1.0} (2.0 - 1.0) + \frac{1}{2}\left.\frac{d^2f}{dx^2}\right|_{x=1.0} (2.0 - 1.0)^2 \\&\quad\quad + \frac{1}{6} \left.\frac{d^3f}{dx^3}\right|_{x=1.0} (2.0 - 1.0)^3 + \frac{1}{24}\left.\frac{d^4f}{dx^4}\right|_{x=1.0} (2.0 - 1.0)^4 \\&= 28.0\end{align}\end{equation*}

Well, this is disappointing. That’s quite a difference. What if I take the linearization point closer? It should help because the dominant term f(a) becomes closer to the real value. If I take the a = 1.5 instead of 1.0, the 4th order summation becomes 31.667438. If a = 1.8, it becomes 33.14626. If a=1.95, it becomes 33.47607. So it does come closer as the linearization point, a, comes closer to the true point, x

But, it doesn’t convince us yet, does it? I want to know if the Taylor series really hold. Let’s take a conveniently simple function

    \[f(x) = \exp(x)\]

whose derivative is also \exp(x) itself.

Same as above, I’m evaluating \exp(2.0) using linearization point at 1.0. Of course, I cannot compute the infinite terms, so I added 100 terms. Writing a small Python script, I get following:

Real: 7.38905609893065 vs. linearized: 7.389056098930649

which is pretty good. It is only one sample function, but it does show that Taylor series do work – which is good to know 🙂

Let’s move on. Taylor series do work, but here at EKF example, we take the first order approximation. Let us visualize how it affects.

One-dimensional Linearization Example

I made a 1D example here. What I’m showing may be obvious and trivial, but thought it would help to visualize one.

    \[f(x) = x^2-2x+1\]


Above plot shows what happens when linearization approximation happens. The quadratic equation is linearized about x=1.5. Two points are evaluated: x=2.0 and x=3.0. You can see how the first order approximation has errors and they grows as they get further from the linearization point. The approximation does a simple thing: to the function value at the linearization point, it adds: \Delta y = \text{slope} \times\Delta x, where the slope is the derivative.

Two-dimensional Example

Let’s also see how the covariance matrix changes when a linearization happens. Again, the first order Taylor series is an approximation of the nonlinear function f(x), and it will never be exact. To minimize the linearization error, we want the linearization point to be very close to the true value.

In the context of extended Kalman filter, the linearization point is the current estimate. Let’s look at following example:

    \begin{equation*}\begin{align}x_{k+1} &= x_k + x_k y_k \cdot 0.1 \\y_{k+1} &= y_k + \sin(y_k) \cdot 0.1\end{align}\end{equation*}

where the derivatives are:

    \begin{equation*}\begin{align}\frac{dx_{k+1}}{dx_k} &= 1 + 0.1 \cdot y_k \\\frac{dx_{k+1}}{dy_k} &= 0.1 \cdot x_k \\\frac{dy_{k+1}}{dx_k} &= 0 \\ \frac{dy_{k+1}}{dy_k} &= 1 + 0.1 \cdot \cos(y_k)\end{align}\end{equation*}

Let’s say the current covariance matrix is:

    \[\Sigma_k = \begin{bmatrix}1.5 & 0 \\ 0 & 1.0\end{bmatrix}\]

and assume the processing noise does not exist (Q=0). According to the EKF equation, the predicted covariance matrix becomes: \bar{\Sigma}_{k+1} = F \Sigma_k F^T where F=\left.\frac{\partial f}{\partial x}\right|_{x=x_0}. Let’s also assume the current state to be: x_0 = [2.5, 1.5]^T. Now, when we carry out the calculation, we get:

    \[\bar{\Sigma}_{k+1} = \begin{bmatrix}2.04625 & 0.25176843 \\ 0.25176843 & 1.01419748 \end{bmatrix}\]

Now, let’s visualize what the true distribution would actually look like. For this, I’ve utilized 500 samples drawn from the original covariance matrix (\Sigma_k). Each point is propagated using the equations above to get the samples at time k+1.

Above figure shows the 500 samples at two different times: at k (in blue) and k+1 (in red). The ellipses represent the 3\sigma error bound of the current estimate. Note that the blue ellipse captures the blue samples very well. These points are propagated to k+1 and now the shape has drastically changed (in red). The red ellipse is the covariance directly measured from the 500 samples, which I consider as the true covariance. Given the shape of how the points are distributed, it’s probably not the best to represent it as a Gaussian, but the covariance value is true by the definition. The green ellipse shows the error bound of the predicted covariance using the EKF equation above. It does a fair job at capturing the mean and most of the red samples, but it fails to capture the long spread across x direction. 

This isn’t the case where the current estimate is far from the true. It actually is matched perfectly at time k. However, given the nonlinearity of the function, the EKF’s estimate isn’t perfectly capturing the true distribution. The severity of it will depend on how nonlinear the function is as well as how close the current estimate is to the true. 


Okay. So in this post, I tried to cover what the consequence of the linearization is. The first-order approximation suffers to capture the true estimates based on two things (among many others): 1) nonlinearity of the function, and 2) how close the current estimate is to the true value. There are other ways to handle for 1) such as using different estimation methods or capturing higher-order terms. Better handling of nonlinearity will reduce the inaccuracy/inconsistency of the filter, and will bring the current estimate closer to the true; which also will address 2). The bottom line is that if the function you’re dealing w/ is too nonlinear, you’d need to use something better than the first-order approximation, or not approximate at all.

Hope this is a helpful post to you all. If you have any question or comment, please leave them here! I’d love to know what your thoughts are.

Leave a Reply

Your email address will not be published. Required fields are marked *