Reporting your data analusis

Learn how to report summary measures, plot descriptions and regression models
Author

Stefano Coretta

Published

November 11, 2024

1 Overview

In this post, I will show you how you can report summary measures, describe plots and report regression models. I will illustrate this using one data table. You will practice this with different data I will provide the code of.

2 Reporting

Lets focus on the data from winter2012/polite.csv (details on the study and data here). (Remember to attach all the necessary packages!)

polite <- read_csv("data/winter2012/polite.csv")
Rows: 224 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): subject, gender, birthplace, musicstudent, task, attitude
dbl (21): months_ger, scenario, total_duration, articulation_rate, f0mn, f0s...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

You would normally also provide information on the data itself, like number of participants, socio-demographic information, total number of observations and so on. I won’t illustrate that since it is self-explanatory. Check papers in your preferred area of linguistics to see what others do.

Let’s calculate the summary measures.

polite |> 
  group_by(attitude, gender) |> 
  summarise(
    mean_f0 = round(mean(f0mn, na.rm = TRUE)),
    sd_f0 = round(sd(f0mn, na.rm = TRUE))
  )
`summarise()` has grouped output by 'attitude'. You can override using the
`.groups` argument.

You can report them like so.

For female participants, the average f0 has a mean of 257 hz (SD = 40) in the informal condition and a mean of 239 hz (SD = 45) in the polite condition. There is a raw difference of 18 hz. For male participants, the average f0 has a mean of 137 hz (SD = 33) in the informal condition and 126 hz (SD = 34) in the polite condition, with a raw difference of 11 hz.

Remember to report a measure of central tendency and a measure of dispersion (a lot of researchers still just report measures of central tendency, but they are meaningless without an accompanying measure of dispersion, as you have now learnt).

Now let’s plot the data.

polite |> 
  drop_na(f0mn) |> 
  ggplot(aes(attitude, f0mn)) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  stat_summary(fun.data = "mean_cl_boot", aes(colour = attitude)) +
  facet_grid(cols = vars(gender)) +
  labs(
    x = "Attitude", y = "Average f0 (hz)",
    caption = "The coloured dots with whiskers represent the mean with 95% bootsrapped Confidence Intervals."
  ) +
  theme(legend.position = "none")
Figure 1: Average f0 by attitude condition and gender

In the text you could write:

Figure 1 shows the average f0 by attitude (informal vs polite) and gender (female vs male). Each dot is the average f0 of a single participant. The plot also includes mean (coloured dots) with 95% bootstrapped Confidence Intervals. It can be observed that the average f0 of male speakers is generally lower than that of female speakers. Furthermore, there is a slight trend for f0 to be somewhat lower in the polite condition relative to the informal condition, in both genders (with females potentially showing a slightly larger difference than males). Finally, there seems to be more variability in average f0 within female than within male participants.

And now onto modelling.

polite_bm <- brm(
  f0mn ~ 0 + attitude:gender,
  family = gaussian,
  data = polite,
  cores = 4,
  seed = 7819,
  file = "cache/reporting-polite_bm"
)

summary(polite_bm, prob = 0.8)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: f0mn ~ 0 + attitude:gender 
   Data: polite (Number of observations: 212) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                    Estimate Est.Error l-80% CI u-80% CI Rhat Bulk_ESS Tail_ESS
attitudeinf:genderF   256.72      5.06   250.03   263.20 1.00     5053     3197
attitudepol:genderF   238.88      4.91   232.56   245.25 1.00     5102     3171
attitudeinf:genderM   137.14      5.96   129.56   144.77 1.00     5184     3243
attitudepol:genderM   126.31      5.77   118.94   133.67 1.00     4657     3372

Further Distributional Parameters:
      Estimate Est.Error l-80% CI u-80% CI Rhat Bulk_ESS Tail_ESS
sigma    39.09      1.96    36.65    41.64 1.00     5046     3013

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
avg_comparisons(polite_bm, variables = "attitude", by = "gender", conf_level = 0.8)

 gender Estimate 10.0 % 90.0 %
      F    -17.8  -26.9 -8.760
      M    -10.9  -21.2 -0.495

Term: attitude
Type:  response 
Comparison: mean(pol) - mean(inf)
Columns: term, contrast, gender, estimate, conf.low, conf.high, predicted_lo, predicted_hi, predicted, tmp_idx 

You can report it this way:

We fitted a Bayesian regression model to average f0, using a Gaussian distribution as the distribution family of the outcome. As predictors, we included attitude (informal vs polite) and gender (female vs male). Both predictors were coded using indexing by suppressing the intercept in the model with the 0+ syntax.

According to the model, the average f0 for female participants is between 247–267 hz when speaking informally vs 230–248 when speaking politely, at 80% confidence. In males, the average f0 is between 126–149 hz in informal speech vs 115–138 hz in polite speech, at 80% probability. The difference in f0 between polite and informal speech is between -27 and -9 hz in females and between -21 and -0.5 hz in males; in other words, f0 is 9–27 hz lower in females and 0.5–21 hz lower in males when the speech is polite relative to informal speech, at 80% confidence.

You could also calculate and report the “difference of differences” (i.e. the difference between the informal/polite difference of females and the informal/polite difference of males). You can achieve that using the same function, avg_comparisions() (check the function’s documentation).

On interpreting Credible Intervals

Credible Intervals are both about the probability that the parameter value is within that interval and the confidence we can have that the value is within that interval.

So for example with a 75% CrI [-1, +3], we can say that:

  • There is a 75% probability that the value is between -1 and +3.
  • We can be 75% confident that the value is between -1 and +3.

Alas, the naming of “Confidence Intervals” is unfortunate, because they don’t measure confidence at all.

3 Practice

Now, look at the following code (feel free to try it out) and write a report of the results (summary measures, plots and regression model). If you have any questions, ask the tutors for help (we are not expecting that you know exactly what is going on with the code, that’s part of the challenge, so if you are confused, just ask!). We won’t release a solution, so either ask us or post your report on Piazza for feedback.

shallow <- read_csv("data/song2020/shallow.csv") |> 
  filter(Critical_Filler == "Critical") |> 
  mutate(Relation_type = factor(Relation_type, levels = c("Unrelated", "NonConstituent", "Constituent")))
Rows: 6500 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Group, ID, List, Target, Critical_Filler, Word_Nonword, Relation_ty...
dbl (3): ACC, RT, logRT

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
shallow |> 
  group_by(Group, Relation_type) |> 
  summarise(
    accuracy_prop = round(mean(ACC), 2)
  )
shallow |> 
  group_by(Group, ID, Relation_type) |> 
  summarise(
    accuracy_prop = round(mean(ACC), 2)
  ) |> 
  pivot_wider(names_from = Relation_type, values_from = accuracy_prop) |> 
  group_by(Group) |> 
  summarise(
    range(Unrelated), range(NonConstituent), range(Constituent)
  )
correct_prop <- shallow |>
  group_by(ID, Group, Relation_type) |>
  summarise(
  correct_prop = sum(ACC) / n(),
    # The following drops the grouping created by group_by() which we don't
    # need anymore.
    .groups = "drop"
  )

ggplot() +
  # Proportion of each participant
  geom_jitter(
    data = correct_prop,
    aes(x = Relation_type, y = correct_prop),
    width = 0.1, alpha = 0.5, height = 0
  ) +
  # Mean proportion by accuracy with confidence interval
  stat_summary(
    data = shallow,
    aes(x = Relation_type, y = ACC, colour = Relation_type),
    fun.data = "mean_cl_boot", size = 1
  ) +
  labs(
    title = "Proportion of correct responses by participant, relation type and group",
    caption = "Mean proportion is represented by coloured points with 95% bootstrapped Confidence Intervals.",
    x = "Relation type",
    y = "Proportion"
  ) +
  facet_grid(cols = vars(Group)) +
  ylim(0, 1) +
  geom_hline(yintercept = 0.5, linetype = "dashed") +
  theme(legend.position = "none")

shallow_bm <- brm(
  ACC ~ 0 + Group:Relation_type,
  family = bernoulli,
  data = shallow,
  cores = 4,
  seed = 1827,
  file = "cache/reporting-shallow_bm"
)

summary(shallow_bm, prob = 0.85)
 Family: bernoulli 
  Links: mu = logit 
Formula: ACC ~ 0 + Group:Relation_type 
   Data: shallow (Number of observations: 1950) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                                    Estimate Est.Error l-85% CI u-85% CI Rhat
GroupL1:Relation_typeUnrelated          1.62      0.14     1.41     1.82 1.00
GroupL2:Relation_typeUnrelated          1.36      0.12     1.20     1.54 1.00
GroupL1:Relation_typeNonConstituent     1.19      0.18     0.93     1.45 1.00
GroupL2:Relation_typeNonConstituent     1.08      0.15     0.86     1.30 1.00
GroupL1:Relation_typeConstituent        2.20      0.18     1.94     2.48 1.00
GroupL2:Relation_typeConstituent        1.74      0.13     1.56     1.92 1.00
                                    Bulk_ESS Tail_ESS
GroupL1:Relation_typeUnrelated          6178     3013
GroupL2:Relation_typeUnrelated          7031     3164
GroupL1:Relation_typeNonConstituent     5373     2991
GroupL2:Relation_typeNonConstituent     5679     2988
GroupL1:Relation_typeConstituent        6289     3118
GroupL2:Relation_typeConstituent        5413     3055

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
avg_comparisons(shallow_bm, variables = list(Relation_type = "pairwise"), by = "Group") |> 
  as.data.frame() |> 
  mutate(
    conf.low = round(conf.low, 2), conf.high = round(conf.high, 2)
  ) |> 
  select(Group, contrast, conf.low, conf.high)