Content of review 1, reviewed on April 02, 2024
The manuscript "Optimising Species Distribution Models: sample size, positional error, and sampling bias matter" discusses critical concerns for developing reliable species distribution models (SDMs), namely sample size, positional error, and sampling bias. The authors have undertaken a comprehensive review of existing knowledge and propose a list of recommendations for dealing with these concerns in SDMs. While the manuscript contributes valuable insights to the field of biogeography, there are several areas where clarification, expansion, and additional considerations could strengthen the manuscript:
Sample size: Although different studies suggested minimum or effective sample size for developing a reasonable model, it is intuitively obvious that it is highly important how a sample represents the environmental range of a species which is related to the distribution of records across the environmental gradient in addition to sample size. Given that many available data are subjected to geographical bias, it is likely that they are not evenly distributed over the environmental range which can highly influence the quality of the training in terms of the required sample size to have a reliable model. For instance, there is no guarantee that a dataset with a sample size of 50 (or even 150) is enough to have a reliable model as a model trained by data with a much smaller sample size that are evenly distributed across the entire species range may be more reliable compared to the model trained by a dataset with a good sample size that is not representing the entire range.
Lines 147-154: The conclusion regarding the adequacy of sample sizes for reliable models is not entirely clear. The authors should provide specific examples or case studies that led to their conclusions, which would offer practical insights to readers.
Lines 158-163: Further clarification is needed on the role of sample size in the capacity of models to identify relevant variables for explaining distributions of species. Including a discussion on the balance between model complexity and the risk of overfitting or underfitting could elucidate this point.
Although the authors mentioned the number of predictors, but an important relevant area is missing which is the relevance of the sample size to the allowed number of predictors in the model, known as the EPV (Events Per Variable) rule. As a rule of thumb, studies specified the minimum of 5 (or 10) records per predictor variable required in the dataset to have a reliable model.
Positional error: The authors need to clarify what they mean by positional error in the sense that it is not necessarily the same as positional uncertainty. It seems that in most situation, they refer to positional uncertainty for which using the term positional error may not be correct.
Line 93: “… studies reported minimum values when …”. Minimum values of what? If you mean sample size, just replace values with the sample size and rephrase accordingly to avoid confusion.
Source
© 2024 the Reviewer.
References
Vitezslav, M., Manuele, B., Ruben, R., Rodolphe, D., Jonathan, L., G., M. R., J., L. J., Neftali, S., Vincent, L., F., C. A., Vojtech, B., Petr, B., Duccio, R., Michele, T., Salvador, A., Matej, M., Dominika, P., Katerina, G., Jiri, P., Elisa, M., Alejandra, Z., Lukas, G., Francois, L., Matilde, M., Marco, M., Roberto, C. G., Jan, W., Petra, S. 2024. Optimising occurrence data in species distribution models: sample size, positional uncertainty, and sampling bias matter. Ecography.