Quantitative Methods for LEL

.title[
# Quantitative Methods for LEL
]
.subtitle[
## Week 7 - Binary outcomes
]
.author[
### Elizabeth Pankratz and Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2023/10/31
]

---

---

## Summary from last week

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- **Comparing groups** with `brm()`

- `outcome ~ predictor`.
  - Categorical predictor with 2 and 3 levels.

- **Treatment coding** of categorical predictors.

- N–1 **dummy variables**, where N is number of levels in the predictor.
  - Level ordering is *alphabetical* but you can specify your own.
  - **NOTE**: You don't have to apply treatment coding yourself! It's done under the hood by R. But you should **understand how it works**.

- **Remember:**

- The **Intercept** `$\beta_0$` is the mean of the reference level.
  - The other `$\beta$`s are the **difference** of the other levels relative to the reference level.
]

---

## What are binary outcomes?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**The variable you're trying to model has two levels**, e.g.:

- yes / no
  - grammatical / ungrammatical
  - Spanish / English
  - indirect object (*gave the girl the book*) / to-PP (*gave the book to the girl*)
  - correct / incorrect

Very common in linguistics!
]

---

## Morphological processing

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- English L1 and L2 participants (L2 participants are native speakers of Cantonese).

- **Lexical decision task:** Is the word a real English word or not?

- Each trial:

- **Prime**: *prolong* (unrelated), *unkind* (constituent), *kindness* (non-constituent).

- **Target**: *unkindness* (*[un-kind]-ness*, not *un-[kind-ness]*).

- Data gathered: Reaction times and accuracy.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
We will focus on **accuracy** (correct identification of real word: **correct/incorrect**) for L1 participants.
]

---

---

**Use this instead of bar charts with error bars!**

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
We assume that there is **some probability `$p$` of responding correctly.**

We don't know this probability, so we want to **use the data to estimate it.**
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

Remember that probabilities are between 0 and 1.

Probabilities of a binary variable follow the **Bernoulli distribution**.

`$$Bernoulli(p)$$`

]

---

## Binary outcomes

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
`$$Bernoulli(p)$$`

- A Bernoulli distribution generates a 1 ("success") with probability `$p$`.
- And a 0 ("failure") with probability `$1 - p = q$`.
  
And we can **think of our binary outcomes in terms of 0s and 1s:**

- 0 = incorrect (or English, or no)
  - 1 = correct (or Scots, or yes)
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
Examples:

- There is a 62% (= 0.62) probability that the response is correct. Which means there is a 38% (= 1 - 0.62 = 0.38) probability that the response is incorrect.

- There is a 89% (= 0.89) probability that the writer chooses an English spelling. Which means there is a 11% (= 1 - 0.89 = 0.11) probability that the writer chooses a Scots spelling.
]

---

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

- Accuracy `$\text{acc}$` is a binary variable: incorrect vs. correct.

- `$p$` is the probability of obtaining a "correct" response.

- **Our goal: estimate `$p$`.**
]

.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph4.mt2[
But there's a problem...
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**Linear models can't estimate bounded data (probabilities) out of the box!**

- A straight line, in principle, goes on forever in am unbounded space.

- But probabilities are bounded between 0 and 1.

- This means we can't fit a straight line to probabilities directly.

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

The solution: **transform the probabilities into something that is not bounded**.

]

.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph4.mt2[
**NOTE**: What follows is for you how to understand how this works, but remember that this is done automatically by R for you so you never have to do it by hand!!!
]

---

<br>

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- The **logit** (*log*istic un*it*) function converts probabilities to **log-odds**.

- Log-odds are simply the log(arithm) of the odds.

- The model can work with log-odds because they are not bounded.

]

???

LO 4

---

## Odds and log-odds

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**What are odds**?

$$
\text{odds} = \frac{\text{probability of a thing happening}}{\text{probability of the thing not happening}} = \frac{p}{1-p}
$$
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

If the probability of rain tomorrow is `$p = 0.7$`, then the odds of rain tomorrow are:

$$
`\begin{aligned}
\text{odds}_{rain} &= \frac{p}{1-p} \\
&= \frac{0.7}{1-0.7} \\
&= \frac{0.7}{0.3} \\
&= 2.333...
\end{aligned}`
$$
]

???

No longer bounded by 1, but are odds unbounded?

---

**Odds are still bounded (and also asymmetrical)**

---

We can log the odds to make them unbounded.

---

**Log-odds are unbounded (and symmetrical)**

---

## Log-odds and probabilities

---

???

On logit vs logistic function: <https://stats.stackexchange.com/a/120364>.

---

Use `qlogis()` (*logit* function) to go from probabilities to log-odds.

```r
qlogis(0.3)
```

```
## [1] -0.8472979
```

```r
qlogis(0.5)
```

```
## [1] 0
```

```r
qlogis(0.7)
```

```
## [1] 0.8472979
```

---

Use `plogis()` (*logistic* function, the inverse of the logit function) to go from log-odds to probabilities.

```r
plogis(-1)
```

```
## [1] 0.2689414
```

```r
plogis(0)
```

```
## [1] 0.5
```

```r
plogis(1)
```

```
## [1] 0.7310586
```

---

## Modelling accuracy

---

.f4[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p)   & \sim Gaussian(\mu, \sigma) \\
\end{aligned}`
$$
]

.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph4.mt2[
- Because we are dealing with the outcome variable in the log-odds space, **all the parameters of the model are going to be estimated in log-odds.**
  - Here, this applies to `$\mu$` and `$\sigma$`.
  
- We'll need to remember this when interpreting the model's estimates.
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

**We need to pick a level of our outcome variable to be the "success"** (i.e. the `1` in the Bernoulli outcome).

- Let's choose `"correct"` as the success, so that we estimate `$p$` as the probability of getting a "correct" response (or in other words the probability of responding correctly).

- To set this in out data, we can reorder the levels in `Accuracy` as `c("incorrect", "correct")`.

- Note that the "success" level is the *second* level! This is different from the reference level business of categorical predictors, because now we are talking about *outcome variables*.
]

```r
shallow <- shallow %>%
  mutate(
    Accuracy = factor(Accuracy, level = c("incorrect", "correct"))
  )

levels(shallow$Accuracy)
```

```
## [1] "incorrect" "correct"
```

---

```r
acc_bm <- brm(
  Accuracy ~ 1,
  family = bernoulli(),
  data = shallow,
  backend = "cmdstanr",
  file = "data/cache/acc_bm"
)
```

```r
summary(acc_bm)
```

```
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Accuracy ~ 1 
##    Data: shallow (Number of observations: 518) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.32      0.11     1.11     1.53 1.00     1218     1913
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.32      0.11     1.11     1.53 1.00     1218     1913
```

<br>

.f4[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & \sim Gaussian(\mu = 1.32, \sigma = 0.11)
\end{aligned}`
$$
]

<br>

- Parameter (`Intercept`): this is `$logit(p)$`.

- **Estimate**: `$\mu = 1.32$` log-odds.

- **Est.Error**: `$\sigma = 0.11$` log-odds.

---

---

## Modelling accuracy

---

---

There is a 95% probability that the log-odds of getting a "correct" response are between 1.11 and 1.53.

---

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
It's easier to understand what that means if we convert log-odds to probabilities using the **logistic function** (which is the inverse of the logit function):

**There is a 95% probability that the probability of getting a "correct" response is between 0.75 and 0.82.**
]

```r
round(plogis(1.11), 2)
```

```
## [1] 0.75
```

```r
round(plogis(1.53), 2)
```

```
## [1] 0.82
```

---

## Reporting

> We fitted a Bayesian linear model to accuracy with a Bernoulli distribution. According to the model, there is a 95% probability that the probability of getting a "correct" response is between 0.75 and 0.82 (`$\beta$` = 1.32, SD = 0.11).

Note that it is easier to interpret probabilities than log-odds, so I recommend you report the probabilities in running text and the log-odds estimates between parentheses.

---

## Modelling accuracy by relation type

---

```r
shallow <- shallow %>%
  mutate(
    Relation_type = factor(Relation_type, level = c("Unrelated", "NonConstituent", "Constituent"))
  )

levels(shallow$Relation_type)
```

```
## [1] "Unrelated"      "NonConstituent" "Constituent"
```

|                            | NonConstituent | Constituent |
|----------------------------|----------------|---------------|
| Relation type = Unrelated       | 0              | 0             |
| Relation type = NonConstituent  | 1              | 0             |
| Relation type = Constituent     | 0              | 1             |

---

.f3[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
\beta_0 & \sim Gaussian(\mu_0, \sigma_0) \\
\beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\
\beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\
\end{aligned}`
$$
]

```r
acc_bm_2 <- brm(
  Accuracy ~ Relation_type,
  family = bernoulli(),
  data = shallow,
  backend = "cmdstanr",
  file = "data/cache/acc_bm_2"
)
```

---

```
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                       1.02      0.17     0.71     1.36 1.00     2777     2484
## Relation_typeNonConstituent     0.19      0.25    -0.30     0.67 1.00     2948     2801
## Relation_typeConstituent        0.86      0.29     0.30     1.42 1.00     2999     2789
```

$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
\beta_0 & \sim Gaussian(1.02, 0.17) \\
\beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\
\beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\
\end{aligned}`
$$
- Parameter `Intercept`: this is `$\beta_0$` (the probability of a correct response when relation type is "unrelated").

- **Estimate**: `$\mu = 1.02$` log-odds.

- **Est.Error**: `$\sigma = 0.17$` log-odds.

---

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When relation type is `Unrelated`:**
- The mean log-odds of getting a correct response is 1.02 (which corresponds to a 73% probability).

- Based on the CrIs, there is a 95% probability that the mean is between 0.71 and 1.36 log-odds (which corresponds to a probability between 0.67 and 0.8).
]

---

- **Effect of `$relation_{ncons}$`**: `$\beta_1$` (the difference in log-odds between non-constituent and unrelated).

- **Estimate**: `$\mu = 0.19$` log-odds.

- **Est.Error**: `$\sigma = 0.25$` log-odds.

---

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When the relation type is non-constituent**:

- The mean change in log-odds is 0.19.

- **You cannot convert this to probabilities!**

- We need to calculate the conditional probability of non-constituent.
]

---

$$
`\begin{aligned}
logit(p) &= \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
&= 1.02 + (0.19 \cdot 1) + (\beta_2 \cdot 0) \\
&= 1.02 + 0.19\\
&= 1.21\\
p &= logistic(1.21)\\
 &= 0.77\\
\end{aligned}`
$$
--

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When the relation type is non-constituent**, the mean probability of responding correctly is 0.77.
]

```r
round(plogis(1.02 + 0.19), 2)
```

```
## [1] 0.77
```

---

---

<br>

```r
round(plogis(1.02 + 0.86), 2)
```

```
## [1] 0.87
```

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When the relation type is constituent**, there is on average an 87% probability of a "correct" response.
]

???

When the relation type is constituent, the mean change in log-odds is 0.86.

---

---

## How do we find the 95% CrIs?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**The empirical rule** (a.k.a., the 68–95–99.7 rule)
]

---

## The empirical rule

---

---

---

---

---

---

???

As a general rule, `$\pm2\sigma$` covers 95% of the Gaussian distribution, which means that there's a 95% probability that the value lies within that range.

---

## Computing CrIs using quantiles

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- **Quantiles** are cut points that divide a continuous probability distribution into intervals with equal probability.

- Common quantiles:

- Quartiles (4 intervals, 25% of the data each).

- Percentiles or centiles (100 intervals).
]

---

---

---

---

---

---

---

---

---

---

```r
acc_bm_2_draws
```

```
## # A tibble: 12,000 × 2
##    Relation_type  value
##    <fct>          <dbl>
##  1 Unrelated      0.773
##  2 NonConstituent 1.16 
##  3 Constituent    2.18 
##  4 Unrelated      1.22 
##  5 NonConstituent 1.32 
##  6 Constituent    1.92 
##  7 Unrelated      0.881
##  8 NonConstituent 1.46 
##  9 Constituent    1.55 
## 10 Unrelated      0.845
## # ℹ 11,990 more rows
```

---

Calculate the **conditional posteriors** of the probability of a "correct" response in each relation type (as log-odds and probabilities).

```r
library(posterior)

# The 95% CrI
acc_bm_2_draws %>%
  group_by(Relation_type) %>%
  summarise(
    q95_lo = quantile2(value, probs = 0.025),  # the 2.5th centile
    q95_hi = quantile2(value, probs = 0.975),  # the 97.5th centile
    p_q95_lo = round(plogis(q95_lo), 2),
    p_q95_hi = round(plogis(q95_hi), 2)
  )
```

```
## # A tibble: 3 × 5
##   Relation_type  q95_lo q95_hi p_q95_lo p_q95_hi
##   <fct>           <dbl>  <dbl>    <dbl>    <dbl>
## 1 Unrelated       0.705   1.36     0.67     0.8 
## 2 NonConstituent  0.853   1.57     0.7      0.83
## 3 Constituent     1.44    2.35     0.81     0.91
```

---

Calculate the **difference** between unrelated and non-constituent/constituent in **percent points**.

```r
as_draws_df(acc_bm_2) %>%
  mutate(
    NonConstituent_d = plogis(b_Intercept + b_Relation_typeNonConstituent) - plogis(b_Intercept),
    Constituent_d = plogis(b_Intercept + b_Relation_typeConstituent) - plogis(b_Intercept)
  ) %>%
  summarise(
    NC_p_95_lo = round(quantile2(NonConstituent_d, probs = 0.025), 2),
    NC_p_q95_hi = round(quantile2(NonConstituent_d, probs = 0.975), 2),
    C_p_95_lo = round(quantile2(Constituent_d, probs = 0.025), 2),
    C_p_q95_hi = round(quantile2(Constituent_d, probs = 0.975), 2)
  )
```

```
## # A tibble: 1 × 4
##   NC_p_95_lo NC_p_q95_hi C_p_95_lo C_p_q95_hi
##        <dbl>       <dbl>     <dbl>      <dbl>
## 1      -0.05        0.12      0.05       0.21
```

---

## Reporting

> We fitted a Bayesian model to accuracy, with a Bernoulli distribution family and relation type (unrelated, non-constituent, constituent) as the only predictor. Relation type was coded using the default treatment contrasts with the unrelated level as the reference level.
>
> Based on the model, there is a 67-80% probability of a "correct" response when the relation type is unrelated, at 95% confidence (`$\beta$` = 1.02, SD = 0.17). When the type is non-constituent, the probability of a "correct" response is 70-83%; at 95% confidence, the difference between unrelated and non-constituent types is between -5 and +12 percent points (`$\beta$` = 0.19, SD = 0.25). When the type is constituent, the probability of a "correct" response is 81-91%; at 95% confidence, the difference between unrelated and non-constituent types is between +5 and +21 percent points (`$\beta$` = 0.86, SD = 0.29).
>
> While the results suggest a robust increase in the probability of a "correct" response in the constituent vs unrelated conditions, rangin from 5 to 21 percent points, the difference between non-constituent and unrelated is less certain with a range that includes both negative and positive values between -5 and +12 points. However, it can be noted that the interval leans more towards positive values than negative values, thus suggesting that if there is a difference it is probably positive although smaller than the difference between constituent and unrelated.