Content of review 1, reviewed on July 08, 2022

The authors report an update in the software package JASP that fixes a peculiar behavior of Bayesian ANOVAs. This behavior arises because previous implementations did not include random slopes in repeated-measures designs, which can yield substantially inflated Type I and Type II error rates. The issue behind this fix is huge and it affects pretty much all previous work using this package whenever researchers analyzed repeated-measures designs with two or more factors. It is therefore commendable that the authors fixed their package accordingly!

I wonder whether a journal publication on this issue is actually the right way to go forward, however. The odd behavior of Bayesian repeated-measures ANOVA has been reported in detail before, with direct reference to JASP, and also to the two R packages BayesFactor and brms. See, for example this recent article:

Oberauer, K. (2022). The importance of random slopes in mixed models for Bayesian hypothesis testing. Psychological Science, 33(4), 648-665.

This article seems to cover pretty much everything that is discussed in the present manuscript (without being cited as far as I can see). What the present article adds is a relieving mention that the algorithm in JASP seems to have been fixed.

A follow-up publication on this topic should ideally present additional insights on top of what is already available in the literature. I could see the authors successfully expanding their article along these lines, but I feel as if substantial extensions would be required for a standalone publication, ideally discussing some of the recent literature in the field. One way forward would be to address other critical limitations of current computational implementations. An example is documented in one of my own articles showing a surprisingly high variability of Bayes Factor estimates across different re-runs of the same analysis on constant data:

Pfister, R. (2021). Variability of Bayes Factor estimates in Bayesian analysis of variance. The Quantitative Methods for Psychology, 17(1), 40-45.

Another viable avenue to increase the contribution of this work would be adding conceptual and mathematical details about random intercept vs. random slopes models in general. These concepts are quite common in linear mixed-effects modeling, but I am not sure how graspable they are to researchers who are accustomed to ANOVA terminology.

In the meantime, it might be good to use quicker channels to inform users about this behavior of the software. For instance, the corresponding Release Notes of JASP 0.16.3 (16 June 2022) only seem to mention that the Bayesian ANOVA functions have been “improved” without describing the scale of the issue.

If the authors were to expand the article, please find some additional suggestions below.

Signed,
Roland Pfister

Additional suggestions and final quibbles:

  • I could not really connect the plots to the methods and analyses. The authors state that “The raw data of all nineteen participants are displayed in Figure 1. The top left panel shows the average response times of the Break and Stroop trials”. I am assuming that this refers to post-break and post-Stroop trials (because there were no responses in Break trials) but there are far more data points per condition than the 19 dots we should expect for this experiment. I am also not sure what information the boxplots and density estimates in the upper panel convey – for me this really seems to add more visual noise than it helps to make sense of the data (they are useful for the difference scores in the lower panels, of course).

  • “the [Bayesian] results indicate little evidence regarding the significant effect of congruency, but strong evidence for the non-significant effect of PT” -> Would be good to include an explicit plausibility check against the descriptive data, where PT has a mean difference of essentially zero. This is implied between the lines, but it might be good to make this point more explicit because it highlights the issues reported here.

  • In the abstract and also within the main text, the authors promise that “The paper concludes with a discussion on how this proposed adjustment impacts previously published results of Bayesian repeated-measures ANOVAs.” Such a discussion seems quite important to me but I could not find it in the manuscript. The only statement appears to be that “The degree to which the non-standard model specification of the Bayesian repeated-measures ANOVA in BayesFactor (with the function anovaBF()) and JASP has affected results published in the literature, unfortunately, remains unclear.” More details on how to re-check previous results and a call to action for actually doing so would be much appreciated.

  • If I understand correctly, the authors chose to implement Type II Sums of Squares for Bayesian ANOVA and Type III Sums of Squares for classical ANOVA in JASP. They add a short discussion on marginality which is a heated topic especially in stats and R forums across the internet (with many people having surprisingly strong opinions about this topic). Along these lines, it was somewhat surprising that the two procedures would have different defaults within the very same program. Furthermore, only the frequentist default appears to be changeable via user input whereas the Bayesian is not if I read Appendix B correctly. I can see that Type III is implemented in JASP to remain consistent with what SPSS would compute, but this seems to introduce unnecessary differences that have nothing to do with the particular statistical framework (Bayesian vs. NHST). Would it not be preferable to use a consistent default her?

  • The provided github link to the data did not work for me (https://github.com/jasp-stats/jaspdesktop/blob/stable/Resources/Data%20Sets/Data%20Library/3.%20ANOVA/Stroop.csv)

  • Regarding the question “Q: Have you made available any and all materials necessary to reproduce your experiments, analyses, or other paper contents?” the authors state that there were no data. Given that the whole point of this work is to showcase a certain type of analysis, it would be good to spell out how the present analyses were done, e.g., by sharing materials such as analysis code to reproduce the reported analyses.

Source

    © 2022 the Reviewer.

Content of review 2, reviewed on January 20, 2023

The authors have revised their manuscript comprehensively, and I think it is almost ready for publication. The last lingering issue for me is that I still find the discussion of Oberauer (2022) to be very misleading to say the least. The paper is mentioned in the passing at two spots if I see correctly:

Intro: “Pivoting to the MRE-model specification is also consistent with recommendations within the broader framework of mixed models (Barr et al., 2013; Oberauer, 2022; van Doorn, Haaf, et al., 2022)“
and
Discussion: “SFR-ANOVA has only recently been proposed to address individual differences; it is subject of controversial debate (also see Oberauer, 2022), new to most analysts, and appropriate follow-up analyses are not readily available”

From these statements I would not be able to figure out that the paper by Oberauer actually makes the very same point as the present article (and very convincingly so). He describes how pure random-intercept models can lead to questionable results, and the present article also states that pure random-intercept models can lead to questionable results. Yes, there are some differences when taking a closer look (e.g., using simulated vs. real datasets), but it is about as similar as a minimally publishable increment would allow.

The authors seem to acknowledge this somewhat in the cover letter – “Our contributions are in line with the work of Oberauer and therefore we view our efforts as mutually reinforcing.“ – but this doesn’t seem to show in the manuscript.

-> Please give some credit where credit is due. It doesn’t take much – a sentence or two would be enough – but I find it important to discuss the literature in a fair and unbiased manner.

[P.S.: The same holds true for Pfister (2021), but I am not going to insist on further discussion since this is my own article].

Signed,
Roland Pfister

Source

    © 2023 the Reviewer.

References

    Don, v. d. B., Eric-Jan, W., Frederik, A. 2023. Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP. Advances in Methods and Practices in Psychological Science.