In my previous post regarding Bayes Rule, I showed an example case of determining what kind of an apple is in a paper bag. In that example, all random variables are *discrete* — i.e. the type of apples (red Gala or green Fuji) and the possible measurements (heavy or light) are discrete. In such case, it tends to be easier to compute the exact probability itself using summation. However, when all random variables are *continuous*, it’s no longer the case.

#### Example) One-dimensional GPS Measurement

Imagine waking up in a middle of a hallway. It is a weird, roofless and straight hallway that there is no window, no marker, no pillar, no nothing. It’s just a purely white hallway, which is very long that you lose track of how far you walked over time. In your right hand, there is one GPS device telling you where you are every 5 minutes. You want to look around, but you’re afraid of losing where you are because there’s nothing to make a reference to.

You want a consistently accurate location, but the problem is that this GPS only reports your location every 5 minutes. Also it says within *2.0 *m margin of error. If you’re walking holding a GPS which updates every second, you’ll have pretty good idea of where you are within its *2.0 *m margin. However, during that 5 minutes, you become more and more uncertain as you walk.

Let’s think about what happens if you walk for 5 minutes. At the beginning of the walk, you are pretty certain of where you are (*0.0* m), say within *0.05* m = *5* cm. You have a pretty good sense of how far you walk each foot step, but (again) it’s never perfect. Let’s say your uncertainty grows *10 *cm every *1.0* m you move. Besides, because this weird hallway has no feature for you to track, the uncertainty can only grow during this 5 minutes.

You’ve walked for about 5 minutes now. Assuming you walk in a relatively comfortable pace at *3* mph = *1.34* m/s = *80.4* meter per minute, you have walked down the hallway *402.0* meter. And your uncertainty has grown to be \frac{5 \mathrm{cm}}{1 \mathrm{m}} \cdot 402.0 \mathrm{m} = 2010 \mathrm{cm} = 20.1 \mathrm{m}. This is a large uncertainty. But let’s see what this means.

We need a way to represent your location uncertainty. You can’t pinpoint where you are because of the accumulated uncertainty about your location. However, you’ll have some idea of where you are in terms of *probability*. Based on your speed (*3* mph), you *could* be at *402.0 *m location, but also there’s good chance of you being *near* *402.0* m. We can use probability distribution here.

We’ll cover about probability distribution is in a later post, but in short a probability density function (pdf) is a function representing how a probability is *distributed*. A pdf, p(x) has following properties: i) p(x) >= 0, ii) \mathrm{Pr}(a \leq x \leq b) = \int_a^b p(x) dx, and naturally, \int_{-\infty}^\infty p(x)dx = 1. A pdf can look like any function as long as the above properties are met. There are, however, a good number of named pdf which are nicely represented with few parameters.

Among the many, one example, probably the most popular choice, is a normal or Gaussian distribution. It looks like a bell-curve and can be represented with two constants — mean (\mu and standard deviation (\sigma). Also one standard deviation corresponds to 68% of the probability, \mathrm{Pr}(\mu-\sigma \leq x \leq \mu+\sigma) \approx 0.68 (refer here).

So, let’s represent your location (x) as a normal pdf,

p(x) = \mathcal{N}(x; 402.0 \mathrm{m}, 12.06^2 \mathrm{m})

where \mathcal{N}(x; \mu, \sigma^2) represents a normal distribution of x where mean and standard deviation are denoted by \mu and \sigma, respectively. This is the prior information of your location, p(x). Below is what this pdf looks like.

Let’s also define the measurements. The only measurement you have is your GPS device telling you where you are within *2.0 *m radius. Let us assume that the GPS signal nicely follows the normal distribution; it would be an unrealistic assumption in an urban area because of all tall buildings and limited open area, but this example environment happens to be an ideal environment for GPS measurements. Thus, we assume

Let’s discuss this term little bit. It is a pdf of a GPS measurement *given *your current location. Note that when something is *given*, you *know* the value (at least you assume so). Therefore, it tells you what your GPS measurement will likely be *assuming* that you’re at *402.0* m precisely. The *measurement function*, h(x), represents what the measurement will be as a function of x. If we write out the normal distribution above, it will be a function of x that x terms will appear; this is important to remember for future discussions when we do marginalization with respect to x and/or z.

Looking back to the Bayes Rule, p(A|B) = \frac{p(B|A) \cdot p(A)}{p(B)}, your location after one GPS measurement can be expressed as:

p(x|z) = \frac{p(z|x) \cdot p(x)}{p(z)}

which becomes

When we plot this *posterior* pdf, we get following:

When you carry out the calculation above, you get a normalized normal distribution with the mean of *402.0* m and standard deviation of *1.3514 *m. Note that when a GPS measurement comes in, your location estimate suddenly becomes very confident; standard deviation decreased from *12.06* m to *1.3514* m. It decreased significantly because i) your prior has such large uncertainty and/or ii) your GPS measurement is very strong.

We observed how one’s location estimate changes when a GPS measurement is received despite its large uncertainty of *2.0* m. One can easily assume what’ll happen when receiving multiple measurements over time; it becomes more confident and confident.

Unlike our previous example of Bayes Rule using probabilities, using probability distribution requires significantly more computation to carry out. It is especially true if you choose to work with an arbitrary distribution who cannot be represented with an equation and parameters. In such case, you’d need to either numerically compute the posterior distribution or have to use approximation.