Regression models: a cheat-sheet

1 One model to rule them all

Regression models (aka linear regression models, linear models, generalised linear models) are a group of statistical models based on the simple idea that we can predict an outcome variable \(Y\) based on a function \(f(X)\).

The “simplest” linear model is the formula of a line:1

\[ y = \alpha + \beta x \]

where \(\alpha\) is the intercept of the line and \(\beta\) the slope (aka gradient).

The principles behind this formula can be extended to represent virtually any other model, independent of the nature of the outcome variable(s) (\(y\)), the predictor(s), the types of relationship between outcome and predictor, and so on. This means that if you master the principles of regression models, then you can virtually fit any kind of data using regression models. You can bid farewell to ANOVAs, \(t\)-tests, \(\chi^2\)-tests, and what not. In fact, these can all be thought of as specific cases of regression models. It just so happens that they got themselves a specific name. But the underlying mechanics is the same.

Same goes with “linear models”, “logistic regression”, “generalised linear models”, “mixed-effects regression” and so on. These are all regression models, so they all follow the same principles. And again, the fact that they got specific name is a historical “accident”. Understanding that these named models are in fact all regression models gives you super powers you can use on data (Sauron would be so jealous):

One model to rule them all, one model to fit them,
One model to shrink them all, and in probability bind them;
In the Land of Inference where the distributions lie.

Ehm… perhaps this is not gonna win a poetry context, but… the message is that with a single tool, i.e. regression models, you can go a long way!

Each of the following sections asks you about the nature of your data and/or experimental design. By answering each, you will find out which “pieces” you need to add to your model structure.

2 Step 0: Number of outcome variables

We will get back to this step at the end of this post, since it makes things a bit more complex.

3 Step 1: Choose a distribution for your outcome variable

The first step towards building a regression model is to choose the family of distributions you believe the outcome variable belongs to. You can start by answering the following question.

Question 1

Is the outcome variable continuous or discrete?

Depending on the answer, check out Section 3.1 or Section 3.2.

3.1 Continuous outcome variable

  • The variable can take on any positive and negative real number, including 0: Gaussian (aka normal) distribution.

    • There are very few truly Gaussian variables, although in some cases one can speak of “approximate” or “assumed” normality.

    • This family is fitted by default in lm(), lme4::lmer() and brms::brm(). You can explicitly select the family with family = gaussian.

  • The variable can take on any positive real number only: Log-normal distribution.

    • Duration of segments, words, pauses, etc, are known to be log-normally distributed.

    • Reaction times can be modelled with a log-normal distribution.

    • Measurements taken in Hz (like f0, formants, centre of gravity, …) could be considered to be log-normal.

    • There other families that could potentially be used depending on the nature of the variable: exponential-Gaussian (reaction times), gamma, …

    • Fit a log-normal model with brms::brm(..., family = lognormal).

  • The variable can take on any real number between 0 and 1, but not 0 nor 1: Beta distribution.

    • Proportions fall into this category (for example proportion of voicing within closure), although 0 and 1 are not allowed in the beta distribution.
    • Fit a beta model with brms::brm(..., family = Beta).
    • Check this tutorial: Coretta and Bürkner (n.d.).
  • The variable can take on any real number between 0 and 1, including 0 or 0 and 1: Zero-inflated or Zero/one-inflated beta (ZOIB) distribution.

    • If the proportion data includes many 0s and 1s, then this is the ideal distribution to use. ZOIB distributions are somewhat more difficult to fit than a simple beta distribution, so a common practice is to transform the data so that it doesn’t include 0s nor 1s (this can be achieved using different techniques, some better than others).
    • Fit a ZOIB model with brms::brm(..., family = zero_one_inflated_beta.
    • Check this tutorial: Heiss (2021).

3.2 Discrete outcome variable

  • The variable is dichotomous, i.e. it can take one of two levels: Bernoulli distribution.

    • Categorical outcome variables like yes/no, correct/incorrect, voiced/voiceless, follow this distribution.

    • This family is fitted by default when you run glm(..., family = binomial), aka “logistic regression” or “binomial regression” or with brms::brm(..., family = bernoulli).2

  • The variable is counts: Poisson distribution.

    • Counts of words, segments, gestures, f0 peaks, …
    • Check out this tutorial: Winter and Bürkner (2021).
    • Fit a Poisson model with brms::brm(..., family = poisson).
    • Sometimes a negative binomial distribution is preferable, if the count data is dispersed. Fit this model with brms::brm(..., family = negbinomial).
  • The variable is a scale: ordinal linear model.

    • Likert scales and ratings, language attitude questionnaires.

    • Fit ordinal regression models (aka ordinal logistic regression) with brms::brm(..., family = cumulative).

    • See these tutorials: Verissimo (2021), Bürkner and Vuorre (2019).

  • The variable has more than two levels, but it is not ordered: categorical (multinomial) model.

    • Fit categorical (multinomial) models with brms::brm(..., family = categorical).

    • As far as I know, there isn’t a tutorial for this family, but Lorson, Cummins, and Rohde (2021) uses categorical models. The research compendium (with data and code) of the paper can be found here: https://osf.io/e5av9/.

    • Just a quick note: if you have an outcome variable with 3 levels (A, B and C) and you fit a categorical (multinomial) model, then \(P(A) = \frac{e^0}{e^0 + e^{B} + e^{C}}\), where the superscripts \(B\) and \(C\) are the estimated log-odds difference of the B and C level vs the A level (these are the two intercepts in the model). This makes sense, because, assuming each level is equally probable, \(\frac{e^{0}}{1 + e^{0} + e^{0}} = 0.333\) for all levels.

4 Step 2: Are there hierarchical groupings and/or repeated measures?

The second step is to ensure that, if the data is structured hierarchically or repeated measures were taken, this is taken into account in the model. Here is where so-called varying terms (aka random effects, group-level effects/terms) come in (Gelman 2005). Models that include random effects/group-level terms are called: random-effects models, mixed-effects models, hierarchical models, nested models, multilevel models. These terms are for all intents and purposes equivalent (it just happens that different traditions use different terms).

As an example, let’s assume you asked a number of participants to read a list of words and each word was repeated 5 times by each participant. You then took f0 measurements from the stressed vowel of each word, of each repetition. Now, the data has a “hierarchical” structure to it:

  • First, observations are grouped by participant (some observations belong to one participant and others to another and so on).
  • Second, observations are grouped by word (some observations belong to one word and others to another and so on).
  • Third, within the observations of each word, some belong to the same participant (or, from a different perspective, within the observations of each participant, some belong to the same word).

The presence of “levels” within the data (whether they come from natural groupings like participant or word, or from repeated measures) breaks one of the assumptions of regression models: that each observation must be independent. This is why you must include varying terms in the regression model, to account for this structure (and now you see why they are called hierarchical and multilevel models). If you don’t include any varying term, your model will expect that each observation is independent and hence it will underestimate variance and return unreliable results.

In the toy-example of f0 measurements, you will want to include varying terms for participant and word. These will take care to let the model know of the structure of the data mentioned above. If you have other predictors in the model, you should also add them as (varying) slopes in the varying terms. For example: (question | participant) + (question | word) (where question = statement vs question).

Here are some tutorials: Winter (2013), DeBruine and Barr (2021), Kirby and Sonderegger (2018), Bürkner (2018), Veenman, Stefan, and Haaf (2023), Pedersen et al. (2019).

5 Step 3: Are there non-linear effects?

A typical use-case of non-linear terms is when you are dealing with time-series data or spatial data (i.e. geographic coordinates). Generalised Additive Models (GAMs) allow you to fit non-linear effects using so called “smooth” (or “smoother”) terms. You can fit a regression model with smooth terms with brms::brm(y ~ s(x)) or with mgcv:gam(y ~ s(x)), among others. See Simpson (2018), Sóskuthy (2017), Sóskuthy (2021), Wieling (2018), Pedersen et al. (2019).

6 Step 0-bis: Number of outcome variables

If you want to model just one outcome variable, you are already covered if you went through steps 1-3. If instead your design has two or more outcome variables (for example F1 and F2, or duration of the stressed and unstressed vowel of a word) which you want to model together, then you want to fit a multivariate model (i.e. a model with multiple outcome variables). The same steps we went through before can be applied to multiple outcome variables. In some cases, you will want to use the same model structure for all the outcome variables, while in others you might want to use a different model structure for each.

To learn more about multivariate models, I really recommend Bürkner (2024).

7 References

Bürkner, Paul-Christian. 2018. “Advanced Bayesian Multilevel Modeling with the r Package Brms.” The R Journal 10 (1): 395411. https://doi.org/10.32614/RJ-2018-017.
———. 2024. “Estimating Multivariate Models with Brms.” https://cran.r-project.org/web/packages/brms/vignettes/brms_multivariate.html.
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77101. https://doi.org/10.1177/2515245918823199.
Coretta, Stefano, and Paul-Christian Bürkner. n.d. “Bayesian Beta Regressions with Brms in r: A Tutorial for Phoneticians.” https://doi.org/10.31219/osf.io/f9rqg_v1.
DeBruine, Lisa M., and Dale J. Barr. 2021. “Understanding Mixed-Effects Models Through Data Simulation.” Advances in Methods and Practices in Psychological Science 4 (1). https://doi.org/10.1177/2515245920965119.
Gelman, Andrew. 2005. “Analysis of Variance: Why It Is More Important Than Ever.” The Annals of Statistics 33 (1). https://doi.org/10.1214/009053604000001048.
Heiss, Andrew. 2021. “A Guide to Modeling Proportions with Bayesian Beta and Zero-Inflated Beta Regression Models.” https://www.andrewheiss.com/blog/2021/11/08/beta-regression-guide.
Kirby, James, and Morgan Sonderegger. 2018. “Mixed-Effects Design Analysis for Experimental Phonetics.” Journal of Phonetics 70: 7085. https://doi.org/10.1016/j.wocn.2018.05.005.
Lorson, Alexandra, Chris Cummins, and Hannah Rohde. 2021. “Strategic Use of (Un)certainty Expressions.” Frontiers in Communication 6 (March): 635156. https://doi.org/10.3389/fcomm.2021.635156.
Pedersen, Eric J., David L. Miller, Gavin L. Simpson, and Noam Ross. 2019. “Hierarchical Generalized Additive Models in Ecology: An Introduction with Mgcv.” PeerJ 7: e6876. https://doi.org/10.7717/peerj.6876.
Simpson, Gavin L. 2018. “Fitting GAMs with Brms: Part 1.” From the Bottom of the Heap. https://fromthebottomoftheheap.net/2018/04/21/fitting-gams-with-brms/.
Sóskuthy, Márton. 2017. “Generalised Additive Mixed Models for Dynamic Analysis in Linguistics: A Practical Introduction.” arXiv. https://doi.org/10.48550/arXiv.1703.05339.
———. 2021. “Evaluating Generalised Additive Mixed Modelling Strategies for Dynamic Speech Analysis.” Journal of Phonetics 84: 101017. https://doi.org/10.1016/j.wocn.2020.101017.
Veenman, Myrthe, Angelika M. Stefan, and Julia M. Haaf. 2023. “Bayesian Hierarchical Modeling: An Introduction and Reassessment.” Behavior Research Methods 56 (5): 4600–4631. https://doi.org/10.3758/s13428-023-02204-3.
Verissimo, Joao. 2021. “Analysis of Rating Scales: A Pervasive Problem in Bilingualism Research and a Solution with Bayesian Ordinal Models.” Bilingualism: Language and Cognition 24 (5): 842848. https://doi.org/10.1017/S1366728921000316.
Wieling, Martijn. 2018. “Analyzing Dynamic Phonetic Data Using Generalized Additive Mixed Modeling: A Tutorial Focusing on Articulatory Differences Between L1 and L2 Speakers of English.” Journal of Phonetics 70: 86116. https://doi.org/10.1016/j.wocn.2018.03.002.
Winter, Bodo. 2013. “Linear Models and Linear Mixed Effects Models in R with Linguistic Applications.” https://arxiv.org/abs/1308.5499; arXiv. https://arxiv.org/abs/1308.5499.
Winter, Bodo, and Paul-Christian Bürkner. 2021. “Poisson Regression for Linguists: A Tutorial Introduction to Modelling Count Data with Brms.” Language and Linguistics Compass 15 (11): e12439. https://doi.org/10.1111/lnc3.12439.

Footnotes

  1. Technically, the “simplest” linear model is \(y = f(x)\), but oh well…↩︎

  2. Note that, despite using family = binomial in glm(), under the hood a Bernoulli distribution is used.↩︎