If you have worked on an estimation problem, whether it’s a probabilistic approach or not, the chances are that you heard about Kalman filter at some point. Kalman filter is an estimation technique that gives you the best/optimal estimates (or states) if your system is *linear* and your noises are (zero mean and) *normally* distributed (i.e. noise is sampled from a Gaussian distribution).

(If you’re curious about why *Kalman* filter, I’m referring to the history section of Wikipedia)

But before we get into the details, I want to give a high-level idea of what an *estimation* is (within my limited knowledge of course). We usually estimate something if we don’t really have direct measurements of it. For an example, you want to know where you are but you can only measure how fast you’re moving. Or you want to know how fast you’re moving but can only measure where you are. None of the *measurements* here gives the direct information about the state you want to know — your position or your speed, respectively. However, you *do* know the relationship between your state and the measurement; position is integration of speed, and speed is derivative of position. From this relationship, you can *infer* your state from measurements. Estimation algorithms let you do that: *infer* the states from non-direct measurements (of the states).

We want to use a Kalman filter if the system is *linear* and the noises are *normally* distributed (Gaussian). These ‘*linear*‘ and’ *normally *distributed *noise*‘ are important assumptions. Once they’re violated (and depending on how severely) one would use other estimation techniques such as nonlinear Kalman filters and non-parametric filters. A few of these will be covered in later posts. For this one, we’ll stick to the *linear *Kalman Filter.

## Continuous or Discrete

Your system can be continuous or discrete. It depends on the nature of your estimation problem and the devices you’re using. With many digital devices these days, Kalman filter implementations take a discrete form (sorry, author does not guarantee this). There are implementations done in continuous forms of course, and it involves a different sets of equations (e.g. matrix exponential instead of a transition model/matrix). Continuous implementations usually turn out to be more complicated than a discrete one. We’ll stick to an implementation in discrete form.

## System Setup

We need a system to work with. It can be any form a linear equation, but it typically takes a form of:

A state transition model, , is a real matrix of a size where represents the length of state vector . This matrix describes how a state changes from one time step to the next time step (i.e. ). It is your job to define the state and the transition model correctly that the transition from one time to the next is sufficiently described. Similarly, the control input model, is a matrix of size (where represents the size of control input ) that describes the relationship between the state and control input . Again, you have to define your state so that how the control inputs affects the system is fully described. The measurement model, relates the available measurements to the state vector; represents the length of measurement vector .

For a first time reader, those noise vectors, and may seem weird. However, remember that everything we do involves some kind of noise. All sensor readings, measurements, are imperfect and is prone to any kind of noise. The state can vary over time without us knowing how and when. These are modeled as the measurement and processing noises, respectively. Without those terms, the equation above are deterministic — meaning that if we know the state and , we know the state *perfectly. *However, those noises are the reason we have estimation algorithms.

For the discussion here, let’s assume the system is invariant. This will make:

## State Vector and Covariance Matrix

Given that the processing and measurement noises are *normally* distributed, the probability distribution of , is a normal (Gaussian) distribution. A normal distribution has a great property that the entire distribution can be described with two parameters: *mean* and *covariance*. They’re also referred as the *first *and *second* moment, respectively. To formally define:

where is an expectation operator, and is formally defined as:

Similarly, we also define the covariance of the process and measurement noises as well.

Also, these noises are zero mean:

To simplify little bit, let’s assume the noise covariances are also time-invariant: .

Since a normal distribution can be fully represented with the mean and covariance, naturally for Kalman filter implementation we keep track of only two numbers: mean and covariance. Thus, all of the Kalman filter equations below describe only two things at the end: mean and covariance.

## Algorithm Steps

Kalman filter is generally understood as a two step algorithm: 1) prediction, and 2) update. Prediction (or propagation) step does predict/propagate current state at time to the next state at time .

Above diagram (or a graphical model) depicts how the algorithm flows. Each circle represents a random variable where the subscript indicates the time step. Each arrow between two variables is the prediction step, and each arrow from a measurement to a state represents the update step. So given the above measurements: and , it follows following steps:

- Initialize Kalman filter at time
- Predict to time
- Update with the measurement at time
- Predict to time
- Predict to time
- Update with the measurement at time
- Predict to time
- Update with the measurement at time
- Predict to time

## Prediction

When time between two time steps are changing, we do need to move the states forward in time. And the dynamic /system model defined in and defines that. To push the state forward:

where and represent the predicted mean and predicted covariance at time from time , respectively. Prediction will almost always happen if any time progression occurs.

## Update

Once measurement is available ( exists), then we can apply this measurement to correct or update the current estimates. Update step involves a few more equations than the prediction step:

Note that each equation line has a name or description in parentheses. We will go over what each of them can tell us about the estimator later.

Relating to the above figure, the measurement at time doesn’t exist. When no measurement is available, the prior estimate becomes the posterior estimate: .

Please leave a comment of what you think! If you have any question/comment, please leave them here ðŸ™‚