If you have worked on an estimation problem, whether it’s a probabilistic approach or not, the chances are that you heard about Kalman filter at some point. Kalman filter is an estimation technique that gives you the best/optimal estimates (or states) if your system is *linear* and your noises are (zero mean and) *normally* distributed (i.e. noise is sampled from a Gaussian distribution).

(If you’re curious about why *Kalman* filter, I’m referring to the history section of Wikipedia)

But before we get into the details, I want to give a high-level idea of what an *estimation* is (within my limited knowledge of course). We usually estimate something if we don’t really have direct measurements of it. For an example, you want to know where you are but you can only measure how fast you’re moving. Or you want to know how fast you’re moving but can only measure where you are. None of the *measurements* here gives the direct information about the state you want to know — your position or your speed, respectively. However, you *do* know the relationship between your state and the measurement; position is integration of speed, and speed is derivative of position. From this relationship, you can *infer* your state from measurements. Estimation algorithms let you do that: *infer* the states from non-direct measurements (of the states).

We want to use a Kalman filter if the system is *linear* and the noises are *normally* distributed (Gaussian). These ‘*linear*‘ and’ *normally *distributed *noise*‘ are important assumptions. Once they’re violated (and depending on how severely) one would use other estimation techniques such as nonlinear Kalman filters and non-parametric filters. A few of these will be covered in later posts. For this one, we’ll stick to the *linear *Kalman Filter.

## Continuous or Discrete

Your system can be continuous or discrete. It depends on the nature of your estimation problem and the devices you’re using. With many digital devices these days, Kalman filter implementations take a discrete form (sorry, author does not guarantee this). There are implementations done in continuous forms of course, and it involves a different sets of equations (e.g. matrix exponential instead of a transition model/matrix). Continuous implementations usually turn out to be more complicated than a discrete one. We’ll stick to an implementation in discrete form.

## System Setup

We need a system to work with. It can be any form a linear equation, but it typically takes a form of: \begin{aligned}x_{k+1} &= F_k x_k + B_k u_k + w_k \\ z_k &= H_k x_k + v_k \\ \\ x_k &= \text{state vector at time k} \\ u_k &= \text{control input at time k} \\ z_k &= \text{measurement vector at time k} \\ F_k &= \text{state transition model at time k} \\ B_k &= \text{control input model at time k} \\ H_k &= \text{measurement model at time k} \\ w_k &= \text{process noise at time k} \\ v_k &= \text{measurement noise at time k} \end{aligned}

A state transition model, F_k \in \mathcal{R}^{n_x \times n_x}, is a real matrix of a size n_x \times n_x where n_x represents the length of state vector x_k. This matrix describes how a state x changes from one time step to the next time step (i.e. k \rightarrow k+1). It is your job to define the state and the transition model correctly that the transition from one time to the next is sufficiently described. Similarly, the control input model, B_k \in \mathcal{R}^{n_x \times n_u} is a matrix of size n_x \times n_u (where n_u represents the size of control input u_k) that describes the relationship between the state x_k and control input u_k. Again, you have to define your state so that how the control inputs affects the system is fully described. The measurement model, H_k \in \mathcal{R}^{n_z \times n_x} relates the available measurements to the state vector; n_z represents the length of measurement vector z_k.

For a first time reader, those noise vectors, w_k and v_k may seem weird. However, remember that everything we do involves some kind of noise. All sensor readings, measurements, are imperfect and is prone to any kind of noise. The state can vary over time without us knowing how and when. These are modeled as the measurement and processing noises, respectively. Without those terms, the equation above are deterministic — meaning that if we know the state x_k and u_k, we know the state x_{k+1} *perfectly. *However, those noises are the reason we have estimation algorithms.

For the discussion here, let’s assume the system is invariant. This will make: \begin{aligned}F_k &= F \\ B_k &= B \\ H_k &=H \end{aligned}

## State Vector and Covariance Matrix

Given that the processing and measurement noises are *normally* distributed, the probability distribution of x_k, p(x_k) is a normal (Gaussian) distribution. A normal distribution has a great property that the entire distribution can be described with two parameters: *mean* and *covariance*. They’re also referred as the *first *and *second* moment, respectively. To formally define: \begin{aligned}\mu_k &= E[x_k] \\ \Sigma_k &= \mathrm{Cov}(x_k, x_k) = E[(x_k - E[x_k])(x_k - E[x_k])^T] \end{aligned} where E[\cdot] is an expectation operator, and is formally defined as: E[x_k] \triangleq \int_{-\infty}^\infty x_k \cdot p(x_k) dx

Similarly, we also define the covariance of the process and measurement noises as well. \begin{aligned} Q_k &= \mathrm{Cov}(w_k, w_k) = E[(w_k - E[w_k])(w_k - E[w_k])^T] \\ R_k &= \mathrm{Cov}(v_k, v_k) = E[(v_k - E[v_k])(v_k - E[v_k])^T] \end{aligned} Also, these noises are zero mean: \begin{aligned}E[w_k] &= 0 \\ E[v_k] &= 0 \end{aligned} To simplify little bit, let’s assume the noise covariances are also time-invariant: Q_k=Q, R_k=R.

Since a normal distribution can be fully represented with the mean and covariance, naturally for Kalman filter implementation we keep track of only two numbers: mean and covariance. Thus, all of the Kalman filter equations below describe only two things at the end: mean and covariance. p(x_k) = \mathcal{N}(x_k; \mu_k, \Sigma_k)

## Algorithm Steps

Kalman filter is generally understood as a two step algorithm: 1) prediction, and 2) update. Prediction (or propagation) step does predict/propagate current state at time k to the next state at time k+1.

Above diagram (or a graphical model) depicts how the algorithm flows. Each circle represents a random variable where the subscript indicates the time step. Each arrow between two x_{(\cdot)} variables is the prediction step, and each arrow from a measurement z_{(\cdot)} to a state x_{(\cdot)} represents the update step. So given the above measurements: z_1, z_3 and z_4, it follows following steps:

- Initialize Kalman filter at time 0
- Predict to time 1
- Update with the measurement at time 1
- Predict to time 2
- Predict to time 3
- Update with the measurement at time 3
- Predict to time 4
- Update with the measurement at time 4
- Predict to time 5

## Prediction

When time between two time steps are changing, we do need to move the states forward in time. And the dynamic /system model defined in F and B defines that. To push the state forward: \begin{aligned}\mu_{k+1|k} &= F \mu_{k|k} + B u_k \\ \Sigma_{k+1|k} &= F \Sigma_k F^T + Q \end{aligned} where \mu_{k+1|k} and \Sigma_{k+1|k} represent the predicted mean and predicted covariance at time k+1 from time k, respectively. Prediction will almost always happen if any time progression occurs.

## Update

Once measurement is available (z_k exists), then we can apply this measurement to correct or update the current estimates. Update step involves a few more equations than the prediction step: \begin{aligned} \tilde{z}_{k+1} &= z_{k+1} - H \mu_{k+1|k} & \text{(measurement innovation)} \\ S &= H \Sigma_{k+1|k} H^T + R & \text{(innovation covariance)} \\ K &= \Sigma_{k+1|k} H^T S^{-1} & \text{(Kalman gain)} \\ \mu_{k+1|k+1} &= \mu_{k+1|k} + K \tilde{z}_{k+1} & \text{(updated mean)} \\ \Sigma_{k+1|k+1} &= (I - K H) \Sigma_{k+1|k} & \text{(updated covariance)} \end{aligned}

Note that each equation line has a name or description in parentheses. We will go over what each of them can tell us about the estimator later.

Relating to the above figure, the measurement at time 2 doesn’t exist. When no measurement is available, the prior estimate becomes the posterior estimate: \mu_{k+1|k+1}=\mu_{k+1|k}, \Sigma_{k+1|k+1}=\Sigma_{k+1|k}.

Please leave a comment of what you think! If you have any question/comment, please leave them here 🙂