Frequentist statistics and the Null Ritual
Suppose you have a treatment that you suspect may alter performance on a certain task. You compare the means of your control and experimental groups (say 20 subjects in each sample). Further, suppose a significance test indicates a significant difference, with a p-value of 0.01. Which of the following statements are FALSE?
You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).
You have found the probability of the null hypothesis being true.
You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
You can deduce the probability of the experimental hypothesis being true.
You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.
Simulate N observations from two populations that have very similar means. Their difference is not linguistically meaningful.
N is sample size: from 50 to 500.
For each N, simulate N observations 100 times (100 draws). For each draw get the p-value of the difference. So 100 p-values for each N.
Focus on the results sections of these papers and discuss how they compare with one another.
How are the statistical approaches different? Which papers focus more on general directional patterns vs more precise estimation? How are the results weighted and use as evidence?