Content of review 1, reviewed on May 09, 2022

This manuscript advocates for the systematic use of simulations in the use and development of complex statistical analysis. It first demonstrates that this practice is not common, lacking even in methods papers. It then presents a framework for simulations studies, using a worked example.

Generally I find this a well written and very useful paper, and I strongly endorse its message.

General comment:

For a topic that has very large scope, I find the manuscript to be quite specific/narrow in focus in two ways.

  1. It is very 'ecology' focussed.
    As a predominately evolutionary biologist, it felt as though the manuscript is not really aimed at me, although I work entirely with hierarchical models (including method development). Just looking through the references, for example, they are largely from the occupancy model literature. Clearly the suggested framework would work just as well in any field, but it seems very ecology focussed nonetheless. This mostly comes from the strong focus on a single very detailed and specific ecological example, that I don't think would be easily followed those unfamiliar with occupancy models. This isn't a method particularly familiar to me, and I would have to invest quite some time to follow the intricacies of the example.

Having said that, the more general parts (Objectives, Simulation settings, Model fitting) in the STUDY-SPECIFIC SIMULATIONS and GENERAL PROPERTY SIMULATIONS sections are clearly articulated, well written and informative.

  1. It requires a reasonably high level of statistical knowledge
    This largely comes from the example used, which is a pretty complex model. This is perhaps the point, as it is a paper on validating complex models, but I found myself skipping over the examples, as they were very specific to the complex model in question and not very generalisable. I also don't have the expertise in this area to assess whether some of the simulations in question are appropriate. I think the treatment of simulations also assumes some level of statistical competence - for example, figure B1.1 would be largely uninterpretable by many biologists I know. Again, maybe this is fine as the target audience will likely have this knowledge.

I think this specificity will limit the audience to those who generally happy with complex methods and I think it won't be accessible to many less quantitative empiricists. To tie this to the manuscripts introduction - I don't think this paper will be used by the majority of authors using hierarchical models in the Nature Ecology and Evolution, but it will be applicable to all the authors publishing new statistical techniques in Methods and Ecology and Evolution. I don't necessarily see this as problem, as I think providing a framework for more transparent and reproducible method development is valuable and so I think the manuscript is useful and highly suitable for MEE. It might be worth considering a slightly broader and/or simpler example, to generally make the manuscript more accessible.

Specific comments:

To me 'Determining statistical properties' seems to fit better in GENERAL PROPERTY SIMULATIONS than in STUDY-SPECIFIC SIMULATIONS. I can see that if you want to assess bias of a model, and you want to use model estimates from observed data to parametrise the simulations, then it kind of becomes study specific, but estimating precision and coverage after the fact seems a bit futile to me, and more in line with the study design section.

l 165.' Said more succinctly, a simulation study is simply an approach for drawing samples from a distribution of a desired statistic, and so identifying which statistics will help answer a particular question is critical.'
Perhaps split into two sentences or rephrase, I had to reread this several times to get the point (which is an important one!)

Box 1 - I don't think the meaning of the superscripts is explained anywhere. This also might be a more general issue throughout the manuscript

l 602 'This provides a straightforward way to assess whether or not more simulations are needed.'
I don't follow this - the sd of the sampling distribution represents the expected Se of the parameter being estimated. Increasing the number of simulations won't change this estimate, it will increase its precision. I'm not sure how looking at the sd of the sampling distribution will help to assess whether enough simulations have been run?

l 529 'We also argue that the inclusion of model code increases reproducibility and transparency in an age where open science in ecology and evolutionary biology is gaining traction, and by including code that simulates dataset and fits them to a statistical model, it will open doors to understanding the assumed data generation process underlying our statistical inferences. '
I think some references to open science literature (of which there are many!) would be appropriate here (and possibly more generally throughout the manuscript). Also I don't think that it needs to be argued that inclusion of model code increases reproducibility and transparency, I think it is safe to say that is fact!!

There seem to be various formatting problems with the equations in box 2

Table S1/S2
These are very difficult to look through. I would suggest using symbols, preferably of different colour, over 'yes' and 'no' to aid readers looking at the table. It might also be helpful to provide some summaries.

Joel Pick

Source

    © 2022 the Reviewer.

References

    V., D. G., Ephraim, H., W., M. D. A. 2023. A practical guide to understanding and validating complex models using data simulations. Methods in Ecology and Evolution.