QML – QML - Week 7

Statistical significance

Suppose you have a treatment that you suspect may alter performance on a certain task. You compare the means of your control and experimental groups (say 20 subjects in each sample). Further, suppose a significance test indicates a significant difference, with a p-value of 0.01. Which of the following statements are FALSE?

You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).
You have found the probability of the null hypothesis being true.
You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
You can deduce the probability of the experimental hypothesis being true.
You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
You have a reliable experimental ﬁnding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a signiﬁcant result on 99% of occasions.

Which statements are false?

Statistical significance is not meaningful

Simulate N observations from two populations that have very similar means. Their difference is not linguistically meaningful.
N is sample size: from 50 to 500.
For each N, simulate N observations 100 times (100 draws). For each draw get the p-value of the difference. So 100 p-values for each N.

Statistical significance is not meaningful

What a p-value really means

P-value

A p-value is the probability of finding a difference as large as or larger than the difference found, assuming there is no difference (i.e. the null hypothesis that the difference is 0).

The “null hypothesis” is always true.
You can only reject the null hypothesis, not disprove it nor prove it.
The p-value is not about the probability that the difference is not 0.
The p-value does not tell you anything about the replicability of the results.

Group activity

Focus on the results sections of these papers and discuss how they compare with one another.

How are the statistical approaches different? Which papers focus more on general directional patterns vs more precise estimation? How are the results weighted and use as evidence?

Let’s discuss!

Now share your thoughts with the class.