How to achieve replicability: illustrations and guidelines at the hand of effect size sampling distributions

  • G.Y. Peters
  • R. Crutzen
  • S. Gruijters


Empirical research requires sampling data from the population under study. This means the outcomes of each scientific endeavour are to a degree uncertain, and therefore, replication is never guaranteed. Two determinants of replication likelihood will be discussed at the hand of the sampling distribution of Cohen’s d. The first is the likelihood that randomization succeeds in generating equivalent groups in experimental designs. The second is the accuracy of effect size estimation in a study. It will be shown that even studies with acceptable power from a null hypothesis significance testing perspective yield very unreliable effect size estimates and often fail to produce equivalent groups. This will be illustrated at the hand of re-analysed data from the Reproducability Project: Psychology. The implication is that traditional power analyses and sample size guidelines underestimate the sample sizes required to obtain findings that are likely to stand the test of replication, and so often cannot inform study planning. Therefore, functions to compute the sample sizes required to obtain replicable results have been implemented in R package userfriendlyscience and will be introduced, and guidelines will be presented based on these computations. These guidelines make clear that there is a need to considerably adjust the expectations regarding sufficient sample sizes. This is important for researchers, and maybe even more so for funders, who will have to get used to the considerably longer timeframes, slightly larger funds, and stricter requirements in terms of acceptable sample sizes that are necessary to build a replicable evidence base in psychology.