Statistical modelling with regression

Authors

Stefano Coretta

Elizabeth Pankratz

Important

When working on this challenge, make sure you are in the course Quarto Project you created earlier.

You know you are in a Quarto Project because you can see the name of the Project in the top-right corner of RStudio, next to the light-blue cube icon.

If you see Project (none) in the top-right corner, that means you are not in a Quarto Project.

To make sure you are in the right Quarto project, you can open the project by going to the project folder in File Explorer (Win) or Finder (macOS) and double click on the .Rproj file.

1 Instructions

For this challenge you have to practice statistical modelling using regression models with the brms package.

The following sections give some more specific instructions for three data frames. To learn more about the data frames, check the “Description” links.

2 Albanian VOT

Description

The file coretta2021/alb-vot.csv has acoustic measurements related to Voice Onset Time (VOT) in Albanian stops.

Use regression to answer the following question: What is the average VOT of voiceless stops in Albanian?

  • Calculate VOT from voi_onset and release.
  • Filter the data so that you only include voiceless stops.

3 Massive Auditory Lexical Decision (MALD)

Description

In tucker2019/mald_1_1.rds there is data from a lexical decision task: participants listened to a word and had to say if the word was a real word or not.

Answer the following question: What is the effect of phonetic Levinstein distance PhonLev on logged reaction times RT_log?

  • You need a regression with one outcome variable and one predictor.

4 Scalar inferences and scalar diversity

Description

The data in pankratz2021/si.csv is from a study that looked at scalar inferences. A scalar inference happens when you encounter a sentence like in (1) and you infer that Fatima didn’t eat all of the cookies.

  1. Fatima ate some of the cookies.

In particular, the study looks at the phenomenon of scalar diversity: the observation that scalar inferences are made at different rates for different words. For example, for “The food was cheap” (where “cheap” is a weaker scalar word), people do often infer that the food wasn’t free (a stronger word on the same scale of price). But, for “The village was pretty”, people don’t often infer that the village wasn’t beautiful.

Use a regression model to answer the following question: What is the relationship between the logged co-occurrence frequency of weak/strong adjective and their semantic distance?

  • Calculate the logarithm of freq.

  • Use a regression model with an outcome variable and a predictor.