Bayes Rule: Example with probability density functions

In my previous post regarding Bayes Rule, I showed an example case of determining what kind of an apple is in a paper bag. In that example, all random variables are discrete — i.e. the type of apples (red Gala or green Fuji) and the possible measurements (heavy or light) are discrete. In such case, it tends to be easier to compute the exact probability itself using summation. However, when all random variables are continuous, it’s no longer the case.

Example) One-dimensional GPS Measurement

Imagine waking up in a middle of a hallway. It is a weird, roofless and straight hallway that there is no window, no marker, no pillar, no nothing. It’s just a purely white hallway, which is very long that you lose track of how far you walked over time.  In your right hand, there is one GPS device telling you where you are every 5 minutes. You want to look around, but you’re afraid of losing where you are because there’s nothing to make a reference to.

You want a consistently accurate location, but the problem is that this GPS only reports your location every 5 minutes. Also it says within 2.0 m margin of error. If you’re walking holding a GPS which updates every second, you’ll have pretty good idea of where you are within its 2.0 m margin. However, during that 5 minutes, you become more and more uncertain as you walk.

Let’s think about what happens if you walk for 5 minutes. At the beginning of the walk, you are pretty certain of where you are (0.0 m), say within 0.05 m = 5 cm. You have a pretty good sense of how far you walk each foot step, but (again) it’s never perfect. Let’s say your uncertainty grows 10 cm every 1.0 m you move. Besides, because this weird hallway has no feature for you to track, the uncertainty can only grow during this 5 minutes. 

You’ve walked for about 5 minutes now. Assuming you walk in a relatively comfortable pace at 3 mph = 1.34 m/s = 80.4 meter per minute, you have walked down the hallway 402.0 meter. And your uncertainty has grown to be \frac{5 \mathrm{cm}}{1 \mathrm{m}} \cdot 402.0 \mathrm{m} = 2010 \mathrm{cm} = 20.1 \mathrm{m}. This is a large uncertainty. But let’s see what this means.

We need a way to represent your location uncertainty. You can’t pinpoint where you are because of the accumulated uncertainty about your location. However, you’ll have some idea of where you are in terms of probability. Based on your speed (3 mph), you could be at 402.0 m location, but also there’s good chance of you being near 402.0 m. We can use probability distribution here. 

We’ll cover about probability distribution is in a later post, but in short a probability density function (pdf) is a function representing how a probability is distributed. A pdf, p(x) has following properties: i) p(x) >= 0, ii) \mathrm{Pr}(a \leq x \leq b) = \int_a^b p(x) dx, and naturally, \int_{-\infty}^\infty p(x)dx = 1. A pdf can look like any function as long as the above properties are met. There are, however, a good number of named pdf which are nicely represented with few parameters.

Among the many, one example, probably the most popular choice, is a normal or Gaussian distribution. It looks like a bell-curve and can be represented with two constants — mean (\mu and standard deviation (\sigma). Also one standard deviation corresponds to 68% of the probability, \mathrm{Pr}(\mu-\sigma \leq x \leq \mu+\sigma) \approx 0.68 (refer here).

So, let’s represent your location (x) as a normal pdf,

    \begin{equation*} p(x) = \mathcal{N}(x; 402.0 \mathrm{m}, 12.06^2 \mathrm{m}) \end{equation*}

where \mathcal{N}(x; \mu, \sigma^2) represents a normal distribution of x where mean and standard deviation are denoted by \mu and \sigma, respectively. This is the prior information of your location, p(x). Below is what this pdf looks like.

Let’s also define the measurements. The only measurement you have is your GPS device telling you where you are within 2.0 m radius. Let us assume that the GPS signal nicely follows the normal distribution; it would be an unrealistic assumption in an urban area because of all tall buildings and limited open area, but this example environment happens to be an ideal environment for GPS measurements. Thus, we assume

    \begin{equation*} \begin{align} p(z|x = 402.0\mathrm{m}) &= \mathcal{N}(z; h(x) = x=402.0 \mathrm{m}, \sigma_z) \\ &= \mathcal{N}(z; 402.0 \mathrm{m}, 1.36^2 \mathrm{m}) \end{align} \end{equation*}

Let’s discuss this term little bit. It is a pdf of a GPS measurement given your current location. Note that when something is given, you know the value (at least you assume so). Therefore, it tells you what your GPS measurement will likely be assuming that you’re at 402.0 m precisely. The measurement function, h(x), represents what the measurement will be as a function of x. If we write out the normal distribution above, it will be a function of x that x terms will appear; this is important to remember for future discussions when we do marginalization with respect to x and/or z.

Looking back to the Bayes Rule, p(A|B) = \frac{p(B|A) \cdot p(A)}{p(B)}, your location after one GPS measurement can be expressed as:

    \[ p(x|z) = \frac{p(z|x) \cdot p(x)}{p(z)} \]

which becomes

    \begin{equation*} \begin{align} p(x|z) = \frac{\mathcal{N}(z; 402.0 \mathrm{m}, 1.36^2 \mathrm{m}) \cdot \mathcal{N}(x; 402.0 \mathrm{m}, 12.06^2 \mathrm{m}) }{p(z)} \\ \quad\quad \quad\quad\quad\quad = \frac{\mathcal{N}(z; 402.0 \mathrm{m}, 1.36^2 \mathrm{m}) \cdot \mathcal{N}(x; 402.0 \mathrm{m}, 12.06^2 \mathrm{m}) }{\int_{-\infty}^\infty \mathcal{N}(z; 402.0 \mathrm{m}, 1.36^2 \mathrm{m}) \cdot \mathcal{N}(x; 402.0 \mathrm{m}, 12.06^2 \mathrm{m}) dx} \end{align} \end{equation*}

When we plot this posterior pdf, we get following:
When you carry out the calculation above, you get a normalized normal distribution with the mean of 402.0 m and standard deviation of 1.3514 m. Note that when a GPS measurement comes in, your location estimate suddenly becomes very confident; standard deviation decreased from 12.06 m to 1.3514 m. It decreased significantly because i) your prior has such large uncertainty and/or ii) your GPS measurement is very strong.

We observed how one’s location estimate changes when a GPS measurement is received despite its large uncertainty of 2.0 m. One can easily assume what’ll happen when receiving multiple measurements over time; it becomes more confident and confident.

Unlike our previous example of Bayes Rule using probabilities, using probability distribution requires significantly more computation to carry out. It is especially true if you choose to work with an arbitrary distribution who cannot be represented with an equation and parameters. In such case, you’d need to either numerically compute the posterior distribution or have to use approximation.


  1. Long time supporter, and thought I’d drop a comment.

    Your wordpress site is very sleek – hope you don’t mind me asking what
    theme you’re using? (and don’t mind if I steal it? :

    I just launched my site –also built in wordpress like yours– but the theme slows (!) the site down quite a bit.

    In case you have a minute, you can find it by searching
    for “royal cbd” on Google (would appreciate any feedback) – it’s still in the

    Keep up the good work– and hope you all take care of yourself
    during the coronavirus scare!

    1. Hi Justin – thank you for visiting our page!
      Sure, I won’t mind sharing the theme 🙂 It’s called “Blog Elite”.
      My site is pretty slow too I think.. and I’m pretty new to this as well.
      Maybe Google Search Console could help giving few suggestions.

Leave a Reply

Your email address will not be published. Required fields are marked *