Tell your students: Null results are significant too
We all know that null (non-significant)* results are informative, useful, and not at all a bad thing, right? Right…
In an entirely rigorous scientific approach to surveying peoples’ perspective of null results, I turned to statistics memes**. Just like academia, these memes neatly reflect the 5% alpha threshold: p < .05 and it’s publication party time, p > .05 and you cry a little.
Students often come face-to-face with their own personal null results towards the end of their research project. Usually the project is worth a good portion of their final grade and they feel the pressure to write a good report. When "good" includes some version of novelty or innovativeness, it’s understandable that they feel the pressure to "find something interesting" (read as: statistically significant) to discuss.
So, when the inevitable happens (p > .05***), a path to the dark side emerges: Some students feel fear; what if they made a mistake or somehow ruined their study? Others feel anger, all they see are significant p values and supported hypotheses, why are they denied the right? Still more feel their burning hatred of statistics emerge again, born from a litany of frustrations throughout their training (not to mention math anxiety - see this twitter thread). Finally, many students feel suffering: at the loss of their favorite hypothesis, certainly that their training has given them little experience of describing and interpreting null results, and in feeling their "non-significant results give them nothing to write about"****.
Honestly, I can't blame them for feeling this way. The prevalence of Questionable Research Practices***** (e.g. Gopalakrishna et al. 2020) and the volume of student projects that "metamorphosize" from dull nulls into pretty supported results (e.g. O’Boyle et al. 2017) showcase that many established researchers embody the same reactions. Instead of focusing on this bleak outlook, we will keep our minds on supporting research students.
A highly accurate, and well-sampled, representation of researchers’ responses to null results.
Most supervisors will recognise any of the following from students' null result papers:
The limitations clearly relate to not finding a significant effect. Sometimes the writing is explicit enough to state that fixing these limitations would help find significant effects.
They deploy the oft-used "approaching statistical significance"****** to allow them to discuss results through the more familiar lens of significant result. Again, to "have something to write about".
The conclusions read as if nothing has been learned about the relationship between the variables, that the study itself was uninformative. Including the possibility that there is no true effect.
Conversely we found no effect, therefore no effect exists (and perhaps all the previous research was actually a massive false positive).
I’ve often wondered how we can help our students and mentees. We can - and should - tell our trainees that a result does not have to be significant to be useful, or important, or "right", and even publishable. But, however strongly we reinforce this, students are faced with overwhelming evidence to the contrary. Their own readings of the literature and every other subtle (or not subtle) message about new and exciting (significant) results we send them usually obliterates the message by the time they’re analyzing their hard collected data.
Instead of sinking into an academic existential dread, here are two ways I have tried to help students overcome the dark side of null results (Warning: Your mileage may vary).
For several students a shallow dive into Bayes Factors******* and equivalence testing helped drastically. These projects included relatively simple models (e.g. in the realms of t-tests, correlations, and multiple regression) so the learning curve was not too steep. Thank you JASP for being an intuitive tool for students with minimal analyses and/or programming skills to run these analyses. My students were able to write that "the evidence favored the null model" instead of "we did not find a significant effect", which seemed to empower them to "have something to talk about". More than this, they had much more confidence in their interpretations of the study, even if that interpretation is that no hypothesis was better supported by the data.
In other student projects, we took to preregistration. Preregistration does not solve all problems (nor should it be expected to, or discussed as if it does), and much of this particular benefit could be achieved without any formal preregistration process. For us, and in more than one project, preregistration was a useful tool to explore several patterns of potential results and how we might interpret each one. It forced us to better consider the theory driving the study and how this interacted with our hypotheses and planned analyses (also see this nice blogpost for a similar non-preregistration argument). This did not entirely remove the sting when p > .05********. But, those students were more capable of discussing the results in reference to the theory they were testing and the report did not read as if the study was worthless or uninformative.
Broad improvements in the statistical training students undergo would also help. Whole communities have grown to support improving research training, for example a Framework for Open and Reproducible Research Training (https://forrt.org/). But, waiting for the glacial pace of reforming curricula to catch up with our current needs gets old fast. Note, students likely don't need more stats or more complex stats. Instead, they need more time dedicated to understanding how statistics work and how they can be used to make inferences.
In the meantime, I have a rapid-fire round of discussion points that have helped my students get to grips with interpreting nulls and hopefully feeling less hopelessness when they see p > .05*********. In no particular order, we could dedicate more time to; sampling variability and the dance of confidence intervals, meta analyses, what actually is a p value, effect sizes, open science, statistical power, common statistical misconceptions (Greenland et al., 2016), and that we should expect null effects in a line of studies even when the effect does exist (e.g. Lakens & Etz, 2017). Sharing papers reporting null results (maybe you have published your null results too?), gives students something tangible to grapple with other than being barraged by statistical significance. Avoiding valenced and judgemental language about results - "failed replications", "failed experiments", results are "negative" or "positive" - may help students feel more comfortable whatever the results. Finally, we can avoid preemptively making students feel they have missed a valuable opportunity by avoiding telling students that we might be able to publish the study "if we find something interesting".
Most students have been told that their null results matter too, that null results tell us something. The often repeated phrase goes something like "even null results tell us something". Often it feels like we are merely paying lip service to an idealized world of research and academic publishing. But, at the risk of sounding preachy, I believe that with consideration and more training we might create a world in which null results are not demonized and avoided, and instead added to the academic record to facilitate the scientific process. More immediately, maybe we can reduce some of our students’ p value anxiety.
*Sorry, Bayesians, this might not be for you
**Code for reproducibility check (https://www.google.co.uk/search?q=p+value+memes). You're welcome, Open Science.
***Cue thunder clap and ominous music
****Almost a literal quote from more than one student
*****Or Questionable Reporting Practices, depending on your personal preference
******For other amusing examples from the published literature, see https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/
******* See, Bayesians, I hadn't forgotten about you
********Cue the second thunder clap
*********And that clap of thunder makes three
Lakens, D., & Etz, A. J. (2017). Too true to be bad: When sets of studies with significant and nonsignificant findings are probably true. Social Psychological and Personality Science, 8(8), 875-881.
Gopalakrishna, G., Riet, G., Vink, G. Stoop, I., Wicherts, J., & Bouter, L. (2020). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: a survey among academic researchers in The Netherlands. https://www.nsri2020.nl/
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
O’Boyle Jr, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376-399.
Resources and links
Framework for Open and Reproducible Research Training (FORRT) https://forrt.org/