[1] 150 172 161 162 161 172 178 156 171 158
Bayesian inference
Note
Probability
Probability of an event occurring.
Probabilities can only be between 0 and 1.
Probability
Probability of an event occurring: 0 to 100% probability.
Probability of a statistical variable being some numeric value: a bit more complicated…
We need probability distributions!
A probability distribution describes how the probabilities are distributed over the values that a variable can take on.
Two types of probability distributions
Discrete probability distributions.
Continuous probability distributions.
You learned about discrete and continuous variables in Week 2!
Discrete variables (numeric or categorical) follow discrete probability distributions and continuous variables follow continuous probability distributions.
You will never need to calculate probability distributions by hand, but it is useful to know about the two mathematical functions that are used for that purpose.
The Probability Mass Function (PMF) for discrete probability distributions.
The Probability Density Function (PDF) for continuous probability distributions.
Probability distributions can be summarised with a set of parameters.
Different types of probability distributions have a different number of parameters and different parameters.
Gaussian distribution
The Gaussian probability distribution is a continuous probability distribution and it has two parameters:
Go to Seeing Theory (by Daniel Kunin).
Human height
Simulate human height as a Gaussian variable.
rnorm()
: number of observations, mean, SD.
PDF constructs the density curve of theoretical distributions.
Kernel Density Estimation (KDE) constructs the density curve of empirical data (observations).
When you simulate data, you know the population mean and SD.
In research, you don’t. You just have observations.
Statistical modelling
Statistical modelling allows you to estimate the mean and SD of the population from the sample. This is statistical inference.
Bayesian Gaussian models do exactly that: estimate mean and SD.
\[ \begin{align} h & \sim Gaussian(\mu, \sigma)\\ \mu & = ...\\ \sigma & = ... \end{align} \]
\[ \begin{align} h & \sim Gaussian(\mu, \sigma)\\ \mu & = P_\mu\\ \sigma & = P_\sigma \end{align} \]
Sample size
Sample size matters: the lower the sample size, the higher the uncertainty.
Let’s try with just 5 observations (out of 200).
We discovered probabilities and probability distributions.
Theoretical probability distributions can be described with PMF/PDF and parameters.
KDE is used for empirical distributions.
(Bayesian) Gaussian models infer the population probability distribution from the data, by estimating posterior probability distributions of mean and SD.
Sample size affects uncertainty.