Regression models: working with MCMC draws

Learn how extract, wrangle and plot MCMC posterior draws
Author

Stefano Coretta

Published

August 5, 2024

1 MCMC what?

Bayesian regression models fitted with brms/Stan use the Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate the probability distributions of the model’s coefficient.

You have encountered MCMC algorithm in Introduction to regression. What the MCMC algorithm does is to sample values (draws) from the posterior distribution to reconstruct it (to learn more about this, see Finding posteriors through sampling by Elizabeth Pankratz and resources therein).

When you run a model with brms, those sampled values (draws) are stored in the model object. Virtually all operations on model are actually done with those draws.

We normally fit BRMs with four MCMC chain. The sampling algorithm within each chain runs for 2000 iterations by default. The first half (1000 iterations) are used to “warm up” the algorithm (i.e. optimise a few parameters of the algorithm) while the second half (1000 iterations) are the ones to reconstruct the posterior distribution.

For four chains ran with 2000 iterations of which 1000 for warm up, we end up with 4000 iterations we can use to learn details about the posteriors.

The rest of the post will teach you how to extract and manipulate the model’s draws.

2 Extract MCMC posterior draws

Let’s read the winter2012/polite.csv data again first.

library(tidyverse)

polite <- read_csv("data/winter2012/polite.csv")
Rows: 224 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): subject, gender, birthplace, musicstudent, task, attitude
dbl (21): months_ger, scenario, total_duration, articulation_rate, f0mn, f0s...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
polite

Now we can reload the Bayesian regression model from Introduction to regression models (Part II). We simply use the same code: but now, instead of the model being fit again, the code will just read the file cache/hnr_bm.rds.

library(brms)

hnr_bm <- brm(
  HNRmn ~ attitude,
  family = gaussian(),
  data = polite,
  cores = 4,
  file = "cache/hnr_bm",
  seed = 2913
)

It’s always good to inspect the summary to make sure the model is the correct one.

summary(hnr_bm, prob = 0.8)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: HNRmn ~ attitude 
   Data: polite (Number of observations: 224) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
            Estimate Est.Error l-80% CI u-80% CI Rhat Bulk_ESS Tail_ESS
Intercept      16.28      0.15    16.08    16.47 1.00     4239     2929
attitudepol     1.25      0.22     0.96     1.54 1.00     3854     2983

Further Distributional Parameters:
      Estimate Est.Error l-80% CI u-80% CI Rhat Bulk_ESS Tail_ESS
sigma     1.69      0.08     1.59     1.79 1.00     3898     2885

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The Regression Coefficients table includes Intercept and attitudepoliteite which is what we expect from the model formula and data. We are good to go!

Now we can extract the MCMC draws from the model using the as_draws_df() function.

This function returns a tibble with values obtained at each draw of the MCMC draws. Since we fitted the model with 4 chains and 1000 sampling draws per chain, there is a total of 4000 drawn values (i.e. 4000 rows).

hnr_bm_draws <- as_draws_df(hnr_bm)
hnr_bm_draws

Don’t worry about the lprior and lp__ columns. The columns .chain, .iteration and .draw indicate:

  • The chain number (1 to 4).
  • The iteration number within chain (1 to 1000).
  • The draw number across all chains (1 to 4000).

The following columns contain the drawn values at each draw for three parameters of the model: b_Intercept, b_attitude and sigma. To remind yourself what these mean, let’s have a look at the mathematical formula of the model.

\[ \begin{align} \text{HNRmn} & \sim Gaussian(\mu, \sigma) \\ \mu & = \beta_0 + \beta_1 \cdot \text{is\_polite} \\ \end{align} \]

So:

  • b_Intercept is \(\beta_0\). This is the mean HNR when the attitude is informal.
  • b_attitudepolite is \(\beta_1\). This is the difference in HNR between polite and informal attitude.
  • sigma is \(\sigma\). This is the overall standard deviation.

3 Summaries and plotting of posterior draws

The Regression Coefficients table reports summaries of the distribution of the drawn values of b_Intercept and b_attitudepolite. These summary measures are the mean (Estimate), the standard deviation (Est.error) and the lower and upper limits of the credible interval.

We can of course obtain those same measures ourselves from the draws tibble!

library(posterior)
This is posterior version 1.6.0

Attaching package: 'posterior'
The following objects are masked from 'package:stats':

    mad, sd, var
The following objects are masked from 'package:base':

    %in%, match
hnr_bm_draws |> 
  select(-sigma) |> 
  pivot_longer(b_Intercept:b_attitudepol) |> 
  group_by(name) |> 
  summarise(
    mean = mean(value), sd = sd(value),
    lo_80 = quantile2(value, 0.1), up_80 = quantile2(value, 0.9)
  ) |> 
  # fancy way of mutating multiple columns
  mutate(
    across(mean:up_80, ~round(., 2))
  )
Warning: Dropping 'draws_df' class as required metadata was removed.

Hopefully now it is clear where the values in the Regression Coefficient table come from and how these are related to the MCMC draws.

Next we can plot the (posterior) probability distribution of the draws for any parameter. Let’s plot b_attitudepolite. This will be the posterior probability density of the difference in HNR between the polite and informal attitude.

hnr_bm_draws |>
  ggplot(aes(b_attitudepol)) +
  geom_density() +
  geom_rug(alpha = 0.2)

4 Calculate predictions based on the draws

The question we ended up with in Introduction to regression models (Part II) was: what about the posterior probability of the predicted HNR when the attitude is polite?.

So far, we have seen the predicted HNR when attitude is informal and the mean difference between polite and informal.

To calculate the predicted HNR when attitude is polite, we should look back at the equation of \(\mu\) from the model mathematical formulae.

\[ \begin{align} \mu & = \beta_0 + \beta_1 \cdot \text{is\_polite} \\ \end{align} \]

When is_polite is 0, attitude is informal and we have \(\mu = \beta_0\). When is_polite is 1, attitude is polite and we have \(\mu = \beta_0 + \beta_1\).

So, to get the predicted HNR when attitude is polite, we just need to sum b_Intercept (\(\beta_0\)) and b_attitudepolite (\(\beta_1\)).

hnr_bm_draws_pred <- hnr_bm_draws |> 
  mutate(
    # predicted HNR for informal attitude
    informal = b_Intercept,
    # predicted HNR for polite attitude
    polite = b_Intercept + b_attitudepol
  )

hnr_bm_draws_pred

The sum operation is applied row-wise, to each draw. So, for polite, the value of b_Intercept at draw 1 is added to the value of b_attitudepolite at draw 1, and so on. You end up with a list of sums that has the same length as the initial draws (here, 4000, i.e. 1000 per chain!). Then you can summarise and plot this new list as you did with the b_ coefficients.

To make that easier, let’s pivot longer.

hnr_bm_draws_pred_long <- hnr_bm_draws_pred |> 
  select(.chain, .iteration, .draw, informal, polite) |> 
  pivot_longer(informal:polite, names_to = "attitude", values_to = "pred")
Warning: Dropping 'draws_df' class as required metadata was removed.
hnr_bm_draws_pred_long

5 Plot and summarise posterior predictions

Now let’s make a violin plot of the predicted HNR in informal and polite attitude!

hnr_bm_draws_pred_long |> 
  ggplot(aes(attitude, pred)) +
  geom_violin(width = 0.2) +
  labs(
    x = "Attitude", y = "Predicted HNR"
  )

We can use the ggdist package to plot something a bit fancier, like a half eye. Check the ?stat_halfeye documentation to learn what those lines are.

library(ggdist)

Attaching package: 'ggdist'
The following objects are masked from 'package:brms':

    dstudent_t, pstudent_t, qstudent_t, rstudent_t
hnr_bm_draws_pred_long |> 
  ggplot(aes(pred, attitude)) +
  stat_halfeye() +
  labs(x = "Predicted HNR", y = "Attitude")

Another useful stat is stat_interval().

hnr_bm_draws_pred_long |> 
  ggplot(aes(attitude, pred)) +
  stat_interval() +
  labs(
    x = "Attitude", y = "Predicted HNR"
  )

And finally, let’s summarise the predictions.

hnr_bm_draws_pred_long |> 
  group_by(attitude) |> 
  summarise(
    mean = mean(pred), sd = sd(pred),
    lo_80 = quantile2(pred, 0.1), up_80 = quantile2(pred, 0.9)
  ) |> 
  # fancy way of mutating multiple columns
  mutate(
    across(mean:up_80, ~round(., 2))
  )

You could report the summary like so:

According to the model, the predicted HNR when attitude is informal is between 16 and 16.48 at 80% confidence (\(\beta\) = 16.27, SD = 0.16). When attitude is polite, the predicted HNR is between 17.31 and 17.73 (\(\beta\) = 17.53, SD = 0.16).