class: center, middle, inverse, title-slide .title[ # Quantitative Methods for LEL ] .subtitle[ ## Week 9 - Continuous predictors and interactions ] .author[ ### Elizabeth Pankratz and Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2023/11/14 ] --- <iframe allowfullscreen frameborder="0" height="100%" mozallowfullscreen style="min-width: 500px; min-height: 355px" src="https://app.wooclap.com/events/SQQFXB/questions/6548ccc4fb81c902774a7e58" width="100%"></iframe> --- ## Categorical predictors, numerically coded .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Categorical predictors from earlier in the course: - **`Modality`**, with levels `Taste` / `Smell`. - **`Relation_type`**, with levels `Unrelated` / `Constituent` / `NonConstituent`. - **`Branching`**, with levels `Left` / `Right`. ] .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **To be able to model these predictors,** the model codes them as 0/1s using the (default) treatment contrasts. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ Good news: **Continuous variables are already numeric!** ] ??? But, consequence: interpreting the model's estimates is a little different. --- layout: true ## Continuous predictors --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - **Reaction time** in milliseconds. - **Age** in years. - **Speech rate** in syllables per second. - **Friendliness ratings** on a 0-100 scale. ] --- The basic **equation for a line**: .f3[ $$ y = b + m\cdot x $$ ] where $$ `\begin{aligned} y: &~~~ \text{outcome variable} \\ b: &~~~ \text{intercept} \\ m: &~~~ \text{slope} \\ x: &~~~ \text{predictor variable} \\ \end{aligned}` $$ <br> -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Let's see an example: `\(y = 4 + 2x\)`. ] --- <img src="index_files/figure-html/line-pos-slope-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-pos-slope2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-pos-slope3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-pos-slope4-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-neg-slope-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-neg-slope2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-neg-slope3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/line-neg-slope4-1.png" width="60%" style="display: block; margin: auto;" /> --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Our linear models have the same form as the lines we just saw. $$ `\begin{aligned} y ~ &=& b ~~ &+& m &\cdot x \\ outcome ~ &=& intercept ~~ &+& slope &\cdot predictor \\ outcome ~ &=& \beta_0 ~~ &+& \beta_1 &\cdot predictor \\ \end{aligned}` $$ ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ `\(\beta_0\)` corresponds to the line's `\(y\)`-intercept: **The outcome value when the predictor equals zero.** ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ And `\(\beta_1\)` has the same interpretation as `\(m\)`: **The change in the outcome that results from a change of 1 (a.k.a., one unit change)<br> in the predictor.** ] --- .pull-left[ .center[ .f4[ **0/1 treatment-coded predictors** ] ] ] .pull-right[ .center[ .f4[ **Continuous predictors** ] ] ] -- .pull-left[ .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Intercept ( `\(\beta_0\)` ):** - Estimated value of the outcome when predictor(s) equal to zero. - i.e., **at predictor's reference level**. ] ] -- .pull-right[ .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Intercept ( `\(\beta_0\)` ):** - Estimated value of the outcome when predictor(s) equal to zero. - **on the predictor's scale**, e.g., <br> 0 ms, 0 kg, 0 metres. <!-- - or if centred: 0 = the predictor's mean --> ] ] -- .pull-left[ .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Effects (other `\(\beta\)`s):** - **Difference** between estimated value of the outcome when predictor = 0 (reference level) and when predictor = 1 (other level). ] ] -- .pull-right[ .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Effects (other `\(\beta\)`s):** - Difference between estimated value of the outcome **for one unit change in predictor's value**. ] ] --- **Difference in outcome for one unit change** -- $$ \beta_0 + \beta_1 \cdot var $$ -- <br> $$ `\begin{aligned} \text{var = 0 :} & & \beta_0 &+ (\beta_1 \cdot 0) &&= \beta_0\\ \text{var = 1 :} & & \beta_0 &+ (\beta_1 \cdot 1) &&= \beta_0 + \beta_1\\ \text{var = 2 :} & & \beta_0 &+ (\beta_1 \cdot 2) &&= \beta_0 + 2\beta_1\\ \text{var = 3 :} & & \beta_0 &+ (\beta_1 \cdot 3) &&= \beta_0 + 3\beta_1\\ \end{aligned}` $$ `$$\vdots$$` -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Every time `\(var\)` increases by one unit, the outcome increases by one `\(\beta_1\)`.** For example, - if `\(var\)` is distance in metres, then a unit is one metre. - if `\(var\)` is RT in milliseconds, then a unit is one millisecond. - if `\(var\)` is speech rate in syllables/second, then a unit is one syllable/second. ] --- Let's build a model predicting vowel duration by speech rate (from `dur-ita-pol.csv`) .pull-left[ <img src="index_files/figure-html/dens-dur-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="index_files/figure-html/dens-sr-1.png" width="100%" style="display: block; margin: auto;" /> ] --- <img src="index_files/figure-html/scatter-dur-sr-1.png" width="60%" style="display: block; margin: auto;" /> ??? Makes sense. The faster you talk, the shorter your vowels are gonna be. --- $$ `\begin{aligned} log(duration) &\sim Gaussian(\mu, \sigma)\\ \mu &= \beta_0 + \beta_1 \cdot speech\_rate \\ \beta_0 &\sim Gaussian(\mu_0, \sigma_0)\\ \beta_1 &\sim Gaussian(\mu_1, \sigma_1)\\ \sigma &\sim TruncGaussian(\mu_2, \sigma_2) \end{aligned}` $$ ```r dur_sr_bm <- brm( log_v1_dur ~ speech_rate, family = gaussian(), data = dur_ita_pol, backend = 'cmdstanr', file = 'data/cache/dur_sr_bm' ) ``` --- ```r summary(dur_sr_bm) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: log_v1_dur ~ speech_rate ## Data: dur_ita_pol (Number of observations: 1334) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 5.88 0.06 5.78 5.99 1.00 4371 3111 ## speech_rate -0.24 0.01 -0.26 -0.22 1.00 4340 3063 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 0.27 0.01 0.26 0.28 1.00 3444 2667 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 5.88 0.06 5.78 5.99 1.00 4371 3111 ## speech_rate -0.24 0.01 -0.26 -0.22 1.00 4340 3063 ``` <br> .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`Intercept`, a.k.a. `\(\beta_0\)`:** - The logged vowel duration when speech rate is equal to zero is between 5.78 and 5.99 log-ms (324-399 ms), at 95% probability. **`speech_rate`, a.k.a. `\(\beta_1\)`:** - For one unit change in speech rate, logged vowel duration changes by –0.22 to -0.26 log-ms, at 95% probability. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **But ... what's a speech rate of zero?** ] --- <img src="index_files/figure-html/scatter-dur-sr-vline-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false layout: true ## Centring continous predictors --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ To centre a variable means to **transform it so that the mean of the centred version is zero.** And, **conversely**, zero in the centred variable corresponds to the mean of the non-centred variable. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Subtract the variable's mean from every observation:** ] ```r dur_ita_pol <- dur_ita_pol %>% mutate( speech_rate_c = speech_rate - mean(speech_rate) ) ``` --- .pull-left[ <img src="index_files/figure-html/scatter-dur-sr-2-1.png" width="100%" style="display: block; margin: auto;" /> ```r round(mean(dur_ita_pol$speech_rate), 2) ``` ``` ## [1] 5.45 ``` ] .pull-right[ <img src="index_files/figure-html/scatter-dur-sr-3-1.png" width="100%" style="display: block; margin: auto;" /> ```r round(mean(dur_ita_pol$speech_rate_c)) ``` ``` ## [1] 0 ``` ] ??? So we predict we'll see the same slope estimate. But a different intercept estimate, since now the place where the line intersects with x = 0 is different. --- ```r dur_sr_c_bm <- brm( log_v1_dur ~ speech_rate_c, family = gaussian(), data = dur_ita_pol, backend = 'cmdstanr', file = 'data/cache/dur_sr_c_bm' ) ``` -- ```r summary(dur_sr_c_bm) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: log_v1_dur ~ speech_rate_c ## Data: dur_ita_pol (Number of observations: 1334) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.59 0.01 4.58 4.61 1.00 4523 2901 ## speech_rate_c -0.24 0.01 -0.26 -0.22 1.00 4271 3037 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 0.27 0.01 0.26 0.28 1.00 3738 2902 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.59 0.01 4.58 4.61 1.00 4523 2901 ## speech_rate_c -0.24 0.01 -0.26 -0.22 1.00 4271 3037 ``` -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`Intercept`, a.k.a. `\(\beta_0\)`:** - When centred speech rate is equal to zero, **i.e., when speech rate is at its mean value of 0,** <br> the log vowel duration is between 4.58 and 4.61 (98-100 ms). - Before, uncentred: `\(\beta_0\)` = 5.88 (95% CrI: [5.78, 5.99]). **`speech_rate_c`, a.k.a. `\(\beta_1\)`**, is the same as before: - For one unit change in speech rate, logged vowel duration changes by –0.22 to -0.26 log-ms, at 95% probability. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Centring continuous predictors is pretty much always a good idea** to ease interpretation of the intercept. ] ??? Not going to show mathematical formulation of model with priors filled in—people can figure that out themselves. Also not going to do the conditional posterior probability distribs—doesn't fit narratively. --- layout: false layout: true ## Interactions: continuous * categorical --- **Speech rate * Italian vowels /a/ and /o/** .pull-left[ ```r # Subset the data dur_ita <- dur_ita_pol %>% filter( language == 'Italian', vowel %in% c('a', 'o') ) # Create this plot ---> dur_ita %>% ggplot(aes(x = speech_rate_c, y = log_v1_dur, colour = vowel, fill = vowel)) + geom_point() + geom_smooth(method = 'lm') + labs( x = 'centred speech rate (syll/sec)', y = 'Vowel duration (log ms)' ) ``` ] .pull-right[ <img src="index_files/figure-html/dur-ita-plot-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? say out loud somewhere in here that "a" = 0, "o" = 1. --- $$ `\begin{aligned} log(duration) &\sim Gaussian(\mu, \sigma)\\ \mu &= \beta_0 + (\beta_1 \cdot speechrate) + (\beta_2 \cdot vowel_o) + (\beta_3 \cdot speechrate \cdot vowel_o) \\ \beta_0 &\sim Gaussian(\mu_0, \sigma_0)\\ \beta_1 &\sim Gaussian(\mu_1, \sigma_1)\\ \beta_2 &\sim Gaussian(\mu_2, \sigma_2)\\ \beta_3 &\sim Gaussian(\mu_3, \sigma_3)\\ \sigma &\sim TruncGaussian(\mu_4, \sigma_4) \end{aligned}` $$ <br> ```r dur_sr_vow_bm <- brm( log_v1_dur ~ speech_rate_c + vowel + speech_rate_c:vowel, family = gaussian(), data = dur_ita, backend = 'cmdstanr', file = 'data/cache/dur_sr_vow_bm' ) ``` --- ```r summary(dur_sr_vow_bm) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: log_v1_dur ~ speech_rate_c + vowel + speech_rate_c:vowel ## Data: dur_ita (Number of observations: 599) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.79 0.01 4.77 4.81 1.00 4722 2788 ## speech_rate_c -0.26 0.01 -0.29 -0.24 1.00 3396 2920 ## vowelo -0.04 0.02 -0.07 -0.01 1.00 4693 2767 ## speech_rate_c:vowelo 0.04 0.02 0.00 0.07 1.00 3137 2518 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 0.18 0.01 0.17 0.19 1.00 3993 2366 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.79 0.01 4.77 4.81 1.00 4722 2788 ## speech_rate_c -0.26 0.01 -0.29 -0.24 1.00 3396 2920 ## vowelo -0.04 0.02 -0.07 -0.01 1.00 4693 2767 ## speech_rate_c:vowelo 0.04 0.02 0.00 0.07 1.00 3137 2518 ``` .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`Intercept`**, a.k.a. `\(\beta_0\)`: - **When centred speech rate is 0 and when the vowel is /a/** (reference level), the log vowel duration is between 4.77 and 4.81, at 95% confidence (`\(\beta\)` = 4.79, SD = 0.01). ] --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.79 0.01 4.77 4.81 1.00 4722 2788 ## speech_rate_c -0.26 0.01 -0.29 -0.24 1.00 3396 2920 ## vowelo -0.04 0.02 -0.07 -0.01 1.00 4693 2767 ## speech_rate_c:vowelo 0.04 0.02 0.00 0.07 1.00 3137 2518 ``` .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`speech_rate_c`**, a.k.a. `\(\beta_1\)`: - **When the vowel is /a/ (reference), for each unit increase in speech rate**, the log vowel duration changes between –0.24 and -0.29, at 95% probability (`\(\beta\)` = -0.26, SD = 0.01). ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`vowelo`**, a.k.a. `\(\beta_2\)`: - **When centred speech rate is 0 (i.e. when speech rate is at its mean, 5.45), when going from /a/ to /o/**, the change in log vowel duration is between -0.07 and -0.01 at 95% confidence (`\(\beta\)` = –0.04, SD = 0.02). ] --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 4.79 0.01 4.77 4.81 1.00 4722 2788 ## speech_rate_c -0.26 0.01 -0.29 -0.24 1.00 3396 2920 ## vowelo -0.04 0.02 -0.07 -0.01 1.00 4693 2767 ## speech_rate_c:vowelo 0.04 0.02 0.00 0.07 1.00 3137 2518 ``` <br> .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`speech_rate_c:vowelo`**, a.k.a. `\(\beta_3\)`, interpretation 1: - **When we go from /a/ to /o/, the effect of speech rate** increases by 0 to 0.07 at 95% confidence (`\(\beta\)` = 0.04, SD = 0.02). ] .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **`speech_rate_c:vowelo`**, a.k.a. `\(\beta_3\)`, interpretation 2: - **When we increase speech rate by one syllable/second, the difference between /o/ and /a/** increases by 0 to 0.07 at 95% probability (`\(\beta\)` = 0.04, SD = 0.02). ] --- .pull-left[ <img src="index_files/figure-html/dur-ita-plot2-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **1: No or small positive adjustment to effect of speech rate in /o/ relative to /a/** - The effect of `speech_rate_c` is negative, so a positive adjustment brings the effect closer to zero. - **The speech rate effect for /o/ might be a bit weaker than for /a/.** ] .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **2: No or small positive adjustment to effect of vowel when speech rate increases** - The effect of `vowel` is negative, so a positive adjustment brings the effect closer to zero. - **As speech rate increases, the difference between /a/ and /o/ decreases.** ] ] --- layout: false layout: true ## Conditional posterior probability distributions --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ With continuous variables, we can **choose a few representative values** to compute conditional posterior distributions for. Here, we'll do (remember that speech rate is *centred*, so 0 is the mean speech rate 5.45): - `\(-1\)` SD: one SD (0.77) below mean - `\(0\)`: mean (5.45) - `\(+1\)` SD: one SD (0.77) above mean ] --- <img src="index_files/figure-html/dur-draws-dens-1.png" width="100%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/dur-draws-dens-2-1.png" width="100%" style="display: block; margin: auto;" /> --- ```r conditional_effects(dur_sr_vow_bm, "speech_rate_c:vowel") ``` <img src="index_files/figure-html/cond-eff-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/cond-eff-2-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false ## Reporting > We fitted a Bayesian model using a Gaussian distribution, with logged vowel duration as the outcome and with centred speech rate and vowel (/a/ vs /o/) as the predictors. An interaction between centred speech rate and vowel was also included. The vowel predictor was coded with the default treatment contrasts and the levels were ordered as given above (/a/ vs /o/). > > According to the model, the duration of /a/ at mean speech rate is 4.77-4.81 logged ms, at 95% confidence. That corresponds to 118-123 ms. The duration of /o/, at mean speech rate, is 4.73-4.77 logged ms, corresponding to 114-118 ms, at 95% confidence. For each one syllable per second increase in speech rate, the logged duration of /a/ decreases by 0.24-0.29 logged ms, at 95% probability. For /o/, logged vowel duration decreases by 0.2-0.25 logged ms. > > In sum, at mean speech rate, /o/ is expected to be somewhat shorter than /a/ (0.8-7.9 ms shorter at 95% confidence). Speech rate has a negative effect on vowel duration in both vowels, more so in /a/ than in /o/.