Quantitative Methods for LEL

.title[
# Quantitative Methods for LEL
]
.subtitle[
## Week 9 - Continuous predictors and interactions
]
.author[
### Elizabeth Pankratz and Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2023/11/14
]

---

---

## Categorical predictors, numerically coded

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

Categorical predictors from earlier in the course:

- **`Modality`**, with levels `Taste` / `Smell`.

- **`Relation_type`**, with levels `Unrelated` / `Constituent` / `NonConstituent`.

- **`Branching`**, with levels `Left` / `Right`.

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**To be able to model these predictors,** the model codes them as 0/1s using the (default) treatment contrasts.

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

Good news: **Continuous variables are already numeric!**

]

???

But, consequence: interpreting the model's estimates is a little different.

---

## Continuous predictors

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- **Reaction time** in milliseconds.

- **Age** in years.

- **Speech rate** in syllables per second.

- **Friendliness ratings** on a 0-100 scale.

]

---

The basic **equation for a line**:

where

$$
`\begin{aligned}
y: &~~~ \text{outcome variable} \\
b: &~~~ \text{intercept} \\
m: &~~~ \text{slope} \\
x: &~~~ \text{predictor variable} \\
\end{aligned}`
$$

<br>

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
Let's see an example: `$y = 4 + 2x$`.
]

---

---

---

---

---

---

---

---

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

Our linear models have the same form as the lines we just saw.

$$
`\begin{aligned}
y ~ &=& b ~~ &+& m &\cdot x \\ 
outcome ~ &=& intercept ~~ &+& slope &\cdot predictor \\
outcome ~ &=& \beta_0 ~~ &+& \beta_1 &\cdot predictor \\
\end{aligned}`
$$
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

`$\beta_0$` corresponds to the line's `$y$`-intercept:

**The outcome value when the predictor equals zero.**

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

And `$\beta_1$` has the same interpretation as `$m$`:

**The change in the outcome that results from a change of 1 (a.k.a., one unit change)<br> in the predictor.**

]

---

- Estimated value of the outcome when predictor(s) equal to zero. 
  - i.e., **at predictor's reference level**.
]
]

.pull-right[
.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Intercept ( `$\beta_0$` ):**

- Estimated value of the outcome when predictor(s) equal to zero. 
  - **on the predictor's scale**, e.g., <br> 0 ms, 0 kg, 0 metres.
  
]
]

.pull-left[
.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Effects (other `$\beta$`s):**

- **Difference** between estimated value of the outcome when predictor = 0 (reference level) and when predictor = 1 (other level).
]
]

.pull-right[
.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Effects (other `$\beta$`s):**

- Difference between estimated value of the outcome **for one unit change in predictor's value**.
]
]

---

**Difference in outcome for one unit change**

$$
\beta_0 + \beta_1 \cdot var
$$

<br>

$$
`\begin{aligned}
\text{var = 0 :}   & & \beta_0 &+ (\beta_1 \cdot 0) &&= \beta_0\\
\text{var = 1 :}   & & \beta_0 &+ (\beta_1 \cdot 1) &&= \beta_0 + \beta_1\\
\text{var = 2 :}   & & \beta_0 &+ (\beta_1 \cdot 2) &&= \beta_0 + 2\beta_1\\
\text{var = 3 :}   & & \beta_0 &+ (\beta_1 \cdot 3) &&= \beta_0 + 3\beta_1\\
\end{aligned}`
$$
`$$\vdots$$`

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**Every time `$var$` increases by one unit, the outcome increases by one `$\beta_1$`.** For example,

- if `$var$` is distance in metres, then a unit is one metre.
- if `$var$` is RT in milliseconds, then a unit is one millisecond.
- if `$var$` is speech rate in syllables/second, then a unit is one syllable/second.
]

---

Let's build a model predicting vowel duration by speech rate (from `dur-ita-pol.csv`)

]

]

---

???

Makes sense.
The faster you talk, the shorter your vowels are gonna be.

---

$$
`\begin{aligned}
log(duration) &\sim Gaussian(\mu, \sigma)\\
\mu &= \beta_0 + \beta_1 \cdot speech\_rate \\
\beta_0 &\sim Gaussian(\mu_0, \sigma_0)\\
\beta_1 &\sim Gaussian(\mu_1, \sigma_1)\\
\sigma &\sim TruncGaussian(\mu_2, \sigma_2)
\end{aligned}`
$$

```r
dur_sr_bm <- brm(
  log_v1_dur ~ speech_rate,
  family = gaussian(),
  data = dur_ita_pol,
  backend = 'cmdstanr',
  file = 'data/cache/dur_sr_bm'
)
```

---

```r
summary(dur_sr_bm)
```

```
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log_v1_dur ~ speech_rate 
##    Data: dur_ita_pol (Number of observations: 1334) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       5.88      0.06     5.78     5.99 1.00     4371     3111
## speech_rate    -0.24      0.01    -0.26    -0.22 1.00     4340     3063
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.27      0.01     0.26     0.28 1.00     3444     2667
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
## Population-Level Effects: 
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       5.88      0.06     5.78     5.99 1.00     4371     3111
## speech_rate    -0.24      0.01    -0.26    -0.22 1.00     4340     3063
```

<br>

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**`Intercept`, a.k.a. `$\beta_0$`:**

- The logged vowel duration when speech rate is equal to zero is between 5.78 and 5.99 log-ms (324-399 ms), at 95% probability.

**`speech_rate`, a.k.a. `$\beta_1$`:**

- For one unit change in speech rate, logged vowel duration changes by –0.22 to -0.26 log-ms, at 95% probability.

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**But ... what's a speech rate of zero?**

]

---

---

## Centring continous predictors

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

To centre a variable means to **transform it so that the mean of the centred version is zero.**

And, **conversely**, zero in the centred variable corresponds to the mean of the non-centred variable.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

**Subtract the variable's mean from every observation:**
]

```r
dur_ita_pol <- dur_ita_pol %>% 
  mutate(
    speech_rate_c = speech_rate - mean(speech_rate)
  )
```

---

```r
round(mean(dur_ita_pol$speech_rate), 2)
```

```
## [1] 5.45
```

]

```r
round(mean(dur_ita_pol$speech_rate_c))
```

```
## [1] 0
```

]

???

So we predict we'll see the same slope estimate.
But a different intercept estimate, since now the place where the line intersects with x = 0 is different.

---

```r
dur_sr_c_bm <- brm(
  log_v1_dur ~ speech_rate_c,
  family = gaussian(),
  data = dur_ita_pol,
  backend = 'cmdstanr',
  file = 'data/cache/dur_sr_c_bm'
)
```

```r
summary(dur_sr_c_bm)
```

```
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log_v1_dur ~ speech_rate_c 
##    Data: dur_ita_pol (Number of observations: 1334) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept         4.59      0.01     4.58     4.61 1.00     4523     2901
## speech_rate_c    -0.24      0.01    -0.26    -0.22 1.00     4271     3037
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.27      0.01     0.26     0.28 1.00     3738     2902
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept         4.59      0.01     4.58     4.61 1.00     4523     2901
## speech_rate_c    -0.24      0.01    -0.26    -0.22 1.00     4271     3037
```

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**`Intercept`, a.k.a. `$\beta_0$`:**

- When centred speech rate is equal to zero, **i.e., when speech rate is at its mean value of 0,** <br> the log vowel duration is between 4.58 and 4.61 (98-100 ms).

- Before, uncentred: `$\beta_0$` = 5.88 (95% CrI: [5.78, 5.99]).

**`speech_rate_c`, a.k.a. `$\beta_1$`**, is the same as before:

- For one unit change in speech rate, logged vowel duration changes by –0.22 to -0.26 log-ms, at 95% probability.

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

**Centring continuous predictors is pretty much always a good idea** to ease interpretation of the intercept.

]

???

Not going to show mathematical formulation of model with priors filled in—people can figure that out themselves.

Also not going to do the conditional posterior probability distribs—doesn't fit narratively.

---

## Interactions: continuous * categorical

---

**Speech rate * Italian vowels /a/ and /o/**

```r
# Subset the data
dur_ita <- dur_ita_pol %>% 
  filter(
    language == 'Italian',
    vowel %in% c('a', 'o')
  )

# Create this plot --->
dur_ita %>% 
  ggplot(aes(x = speech_rate_c, 
             y = log_v1_dur, 
             colour = vowel, 
             fill = vowel)) +
  geom_point() +
  geom_smooth(method = 'lm') +
  labs(
    x = 'centred speech rate (syll/sec)',
    y = 'Vowel duration (log ms)'
  )
```
]

.pull-right[
<img src="index_files/figure-html/dur-ita-plot-1.png" width="100%" style="display: block; margin: auto;" />
]

???

say out loud somewhere in here that "a" = 0, "o" = 1.

---

$$
`\begin{aligned}
log(duration) &\sim Gaussian(\mu, \sigma)\\
\mu &= \beta_0 + (\beta_1 \cdot speechrate) + (\beta_2 \cdot vowel_o) + (\beta_3 \cdot speechrate \cdot vowel_o) \\
\beta_0 &\sim Gaussian(\mu_0, \sigma_0)\\
\beta_1 &\sim Gaussian(\mu_1, \sigma_1)\\
\beta_2 &\sim Gaussian(\mu_2, \sigma_2)\\
\beta_3 &\sim Gaussian(\mu_3, \sigma_3)\\
\sigma &\sim TruncGaussian(\mu_4, \sigma_4)
\end{aligned}`
$$

<br>

```r
dur_sr_vow_bm <- brm(
  log_v1_dur ~ speech_rate_c + vowel + speech_rate_c:vowel,
  family = gaussian(),
  data = dur_ita,
  backend = 'cmdstanr', file = 'data/cache/dur_sr_vow_bm'
)
```

---

```r
summary(dur_sr_vow_bm)
```

```
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: log_v1_dur ~ speech_rate_c + vowel + speech_rate_c:vowel 
##    Data: dur_ita (Number of observations: 599) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                4.79      0.01     4.77     4.81 1.00     4722     2788
## speech_rate_c           -0.26      0.01    -0.29    -0.24 1.00     3396     2920
## vowelo                  -0.04      0.02    -0.07    -0.01 1.00     4693     2767
## speech_rate_c:vowelo     0.04      0.02     0.00     0.07 1.00     3137     2518
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.18      0.01     0.17     0.19 1.00     3993     2366
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                4.79      0.01     4.77     4.81 1.00     4722     2788
## speech_rate_c           -0.26      0.01    -0.29    -0.24 1.00     3396     2920
## vowelo                  -0.04      0.02    -0.07    -0.01 1.00     4693     2767
## speech_rate_c:vowelo     0.04      0.02     0.00     0.07 1.00     3137     2518
```

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**`Intercept`**, a.k.a. `$\beta_0$`:

- **When centred speech rate is 0 and when the vowel is /a/** (reference level), the log vowel duration is between 4.77 and 4.81, at 95% confidence (`$\beta$` = 4.79, SD = 0.01).

]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**`speech_rate_c`**, a.k.a. `$\beta_1$`:

- **When the vowel is /a/ (reference), for each unit increase in speech rate**, the log vowel duration changes between –0.24 and -0.29, at 95% probability (`$\beta$` = -0.26, SD = 0.01).

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**`vowelo`**, a.k.a. `$\beta_2$`:

- **When centred speech rate is 0 (i.e. when speech rate is at its mean, 5.45), when going from /a/ to /o/**, the change in log vowel duration is between -0.07 and -0.01 at 95% confidence (`$\beta$` = –0.04, SD = 0.02).

]

---

<br>

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**`speech_rate_c:vowelo`**, a.k.a. `$\beta_3$`, interpretation 1:

- **When we go from /a/ to /o/, the effect of speech rate** increases by 0 to 0.07 at 95% confidence (`$\beta$` = 0.04, SD = 0.02).

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**`speech_rate_c:vowelo`**, a.k.a. `$\beta_3$`, interpretation 2:

- **When we increase speech rate by one syllable/second, the difference between /o/ and /a/** increases by 0 to 0.07 at 95% probability (`$\beta$` = 0.04, SD = 0.02).
]

---

.pull-left[
<img src="index_files/figure-html/dur-ita-plot2-1.png" width="100%" style="display: block; margin: auto;" />
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**1: No or small positive adjustment to effect of speech rate in /o/ relative to /a/**
- The effect of `speech_rate_c` is negative, so a positive adjustment brings the effect closer to zero.
- **The speech rate effect for /o/ might be a bit weaker than for /a/.**
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**2: No or small positive adjustment to effect of vowel when speech rate increases**
- The effect of `vowel` is negative, so a positive adjustment brings the effect closer to zero.
- **As speech rate increases, the difference between /a/ and /o/ decreases.**

]

---

## Conditional posterior probability distributions

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

With continuous variables, we can **choose a few representative values** to compute conditional posterior distributions for. 
Here, we'll do (remember that speech rate is *centred*, so 0 is the mean speech rate 5.45):

- `$-1$` SD: one SD (0.77) below mean
  - `$0$`: mean (5.45)
  - `$+1$` SD: one SD (0.77) above mean

]

---

---

---

```r
conditional_effects(dur_sr_vow_bm, "speech_rate_c:vowel")
```

---

---

## Reporting

> We fitted a Bayesian model using a Gaussian distribution, with logged vowel duration as the outcome and with centred speech rate and vowel (/a/ vs /o/) as the predictors. An interaction between centred speech rate and vowel was also included. The vowel predictor was coded with the default treatment contrasts and the levels were ordered as given above (/a/ vs /o/).
>
> According to the model, the duration of /a/ at mean speech rate is 4.77-4.81 logged ms, at 95% confidence. That corresponds to 118-123 ms. The duration of /o/, at mean speech rate, is 4.73-4.77 logged ms, corresponding to 114-118 ms, at 95% confidence. For each one syllable per second increase in speech rate, the logged duration of /a/ decreases by 0.24-0.29 logged ms, at 95% probability. For /o/, logged vowel duration decreases by 0.2-0.25 logged ms.
>
> In sum, at mean speech rate, /o/ is expected to be somewhat shorter than /a/ (0.8-7.9 ms shorter at 95% confidence). Speech rate has a negative effect on vowel duration in both vowels, more so in /a/ than in /o/.