Content of review 1, reviewed on October 07, 2016

This paper is a very important contribution on how to improve the way we present statistical data. The reception of this paper has been quite astounding with over 39,000 visits on Plos biology website since 22 April. Astonishingly, my tweet on this paper (https://twitter.com/GaetanBurgio/status/590958042444800001) has attracted > 300 Retweets, > 300 favorites and over 30,000 impressions. Additionally, today a comment on this paper has been featured in Nature News and comments (http://www.nature.com/news/bar-graphs-criticized-for-misrepresenting-data-1.17383), which start trending strongly on Twitter. This response to this paper underlines the widespread of bad habits in statistics and data representation. I would like to take this unexpected opportunity to share a summary of my discussions on Twitter on this paper and my personal take on this story. Hopefully we can start an interesting and fruitful discussion on this forum. For once, it won't be on data manipulation and paper retraction!

I would like to make two general comments.

Firstly, bad statistics and bad habits are widespread and common throughout Science, especially in biological sciences. It undermines the reproducibility of the data and experiments. This leads to a waste of public funds and time to reproduce experiments. This can be expressed in various forms amongst small samples sizes, P.hacking (http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106) or cherry-picking data. How many times I reviewed papers with ridiculously low sample size or cherry picking data. I don't really think I need to convince the readers of this discussion forum on this epidemic habit.

Secondly, the level of the researchers in biological science is often really poor and how many times I've seen students, postdoctoral researchers or even PIs just able to perform a t-test on Excel and having virtually no knowledge in basic statistics. We can discuss endlessly on this topic. Although, one specific issue I came across is the teaching of statistics is often boring or not attractive for students. Some would disagree with this but we can discuss this here.

More specifically on this paper now.

Some would argue that this paper is basically a revisit of Anscombe quartet (http://en.wikipedia.org/wiki/Anscombe's_quartet) which is probably true. However, it is always very good and refreshing to see someone speaking out and trying to address the issue of data misrepresentation.

To the question why this bar representation is so dominant? I can see two reasons for this. Firstly it is very convenient and easy to do on Excel and secondly became some sort of standard data representation without really questioning its limitations. I am not an exception to this. This was one amongst explanations retained from the journalist at Nature (see http://www.nature.com/news/bar-graphs-criticized-for-misrepresenting-data-1.17383).

One interesting point raised through my interesting discussions on Twitter. Should the authors provided a R code instead of an Excel spreadsheet? I guess many would collapse or fit seeing statistician recommending Excel instead of R. To this I would argue that not everyone is proficient with R and the reality is many are unable to do the basic statistics on R given they don't even have the basics in statistics. While I am a R user for over 12 years, I would agreed with the authors. In short, Excel is the way to go if we would like to change bad habits to non proficient statisticians (this is basically the purpose of the paper).

Other interesting point: parametric or non-parametric? I would simply argue this is not the point that the authors are addressing in their papers. Free to us to improve their publications and propose alternative methods of data representations (plots, tables or whatever) as long as it is easy to use to everyone.

Finally, how we go from there? The authors rightly described 3 recommendations: Encourage a more complete presentation of data, change data policies in journals and train investigators. I would argue here that the issue here is not a simple data representation story, it is about bad statistics. As grants and papers reviewers, we should be at the forefront in changing these bad habits (including ours) and improving the reproducibility of experiments.

Overall, this paper is a must read paper and so refreshing for all of us. I would like to thank the authors for such fantastic contribution.

Dr Gaetan Burgio The Australian National University Canberra, Australia.

Source

    © 2016 the Reviewer (CC BY 4.0).

References

    L., W. T., M., M. N., J., W. S., D., G. V. 2015. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biology.