Content of review 1, reviewed on June 16, 2022

I revised the manuscript entitled “Pervasive impacts of climate change on the woodiness and ecological generalism of dry forest plant assemblages”. This research aimed to study the effect of climate change on Caatinga plant species distribution and the changes in beta diversity. I found the research objectives very interesting and important to show the potential future impact of climate change on the Caatinga ecoregion. The analysis of model outputs is suitable for addressing the research objective, and figures are nice. However, I found several theoretical and methodological issues that make these results not reliable. My main concerns are: 1) I think using the “assemblages” term in the manuscript is incorrect because SDMs do not model assemblages and even usually fails to predict assemblages. 2) The occurrence database is incomplete and poorly cleaned. 3) Model distribution of terrestrial plant species using only climate variables is wrong because edaphic properties are an essential part of plant species niches. 4) authors did not evaluate and excluded model extrapolation, which is extremely important to address in research evaluating climate change. 5) the way in which future distributions were handled was quite simple in a scenario of natural species dispersal. 6) lack of information in several model steps. You can find the detailed comment below. I tried to provide extensive literature and resources to help you overcome these issues.

Comment about “assemblages” term. I disagree with using the term assemblages in this research because you are not modeling assemblages (like in JSDM or SESAM approaches; Guisan & Rahbek, 2011; Ovaskainen & Abrego, 2020; See also Box 1 in Pollock et al., 2020). Species assemblages result from various processes like species dispersal, habitat filtering, competition, etc. SDM models species individually and does not incorporate any information about the attributes and processes that structure assembly. This is the reason that Stacked-SDMs are simply the sum of parts and generally fail to estimate ecological assemblages and richness (Aranda & Lobo, 2011; Dubuis et al., 2011; Pottier et al., 2013; Zurell et al., 2020), and the effect of climate change on assemblage (Di Febbraro et al., 2018). For these reasons, I recommend substituting the “assemblage” term with other words throughout the manuscript (except the Introduction) and being explicit that you are not modeling assemblage.

L93. It is not a good idea only use GBIF as the unique source of occurrence data. Despite GBIF being perhaps the biggest biodiversity data aggregator, many other sources of occurrences do not share their data with GBIF (even high databases like iDigBio and iNaturalist see Feng et al., 2022). You must gather the maximum number of occurrences to represent the niche of your species better. Integrating occurrence data for different sources is critical in poorly sampled areas, as in the Neotropics. So, redo your occurrence database. Some important sources of plant occurrences in the Neotropics, e.g., iDigBio, NeoTropTree, DryFlor, BIEN, and specifically in Brazil, SpeciesLink, ICMBIO, and SiBBr (additional comment, if you use GBIF, you need to generate a DOI).

L95-96. Occurrence data cleaning is essential for constructing SDMs because species occurrences are the unique real data used in SDM. However, this step was merely addressed and poorly explained. What kind of error do you mean by “georeferencing errors”? Lack of latitude and longitude? Or occurrences georeferenced in country centroids? What was the coordinate precision used? And what is “uncertain identification”? What about the temporal aspect of occurrence data? What was the taxonomical authority used to update and homogenize species names? I strongly recommend using a more accurate data cleaning procedure addressing geographical, temporal, and taxonomical issues. Several R packages and papers were published to perform such data cleaning (Grenié et al., 2022; Jin & Yang, 2020; Ribeiro et al., 2022; Rodrigues et al., 2022; Zizka et al., 2019).

L102-103. Why did you exclude species <15 data? That decision needs a justification.

L117-133. Model terrestrial plant species only with climate variables is not correct. Edaphic chemical and physical properties determine a vital part of the niche of terrestrial plants. In the context of SDM was proved and discussed several times the importance of using edaphic variables to model plant species niches (Beauregard & Blois, 2014; Bertrand et al., 2012; Dubuis et al., 2013; Thuiller, 2013; Tomlinson et al., 2020; Velazco et al., 2017). In the context of your research, which aims to evaluate the effect of climate change, it is crucial to use edaphic variables because many edaphic properties (soil depth, bulk density, concentration of clay, silt and sand, etc.) change extremely slow over time (thousands of years) which may result in the detection of edaphoclimatic refuges (Bertrand et al., 2012). In addition, the Caatinga region has interesting edaphic properties because the aridity of that region resulted in poorly evolved soils. Therefore, I recommend reconstructing your SDM using edaphic variables for those terrestrial plant species (except for epiphytes and aquatic), maybe those related to physical properties. SoilGrids (https://soilgrids.org) is an interesting source.

L137-139. You need a justification for using this kind of pseudo-absences method. Environmental bias pseudo-absences are more interesting for modeling invasive species (see Hattab et al., 2017).

L149-151. Because you will project models onto different time periods, you need to evaluate model transferability. Using random fold cross-validation is not appropriate for evaluating model transferability. Instead, I recommend using geographically or environmentally structured partition methods (Helmstetter et al., 2020; Roberts et al., 2017; Valavi et al., 2019; Wenger & Olden, 2012).

L143-145. There is no detailed description of how the algorithms were used. Models like Maxent and Random Forest have several hyperparameters. What were those values? Were interaction and polynomials used for GLM? Was a selection of predictor variables performed for GAM or GLM? For Maxent was not even specified the number of background points used. All such information is not in the manuscript.

L135-168. Ecologial niche models. There are two aspects of your modeling protocol that really concern me. 1) Lack of evaluation and correction of the extrapolation of the models. Extrapolation should be evaluated and excluded (e.g. truncating the models see (Montti et al., 2021; Owens et al., 2013; Stohlgren et al., 2011), for current and future models. Algorithms such as Random Forest, GLM, GAM and Maxent can predict unrealistic suitability values for projection data far away from the training conditions. This is not trivial, especially in large study areas future model projections. There are many metrics to measure extrapolation that you can use, I suggest the following articles (Bouchet et al., 2020; Elith et al., 2010; Engler & Rödder, 2012; Mesgaran et al., 2014; Owens et al., 2013; Zurell et al., 2012). 2) The second point that concerns me is how the future distribution of the species was treated. Delimiting a species’s future distribution is challenging, and many techniques exist (Briscoe et al., 2019). I appreciate the effort of wanting to restrict the distributions only within the calibration area, but I also believe that this is not enough. With your approach, future distribution will likely be represented by suitability patches inaccessible with natural dispersal because the areas used for calibration may be large. Therefore, a more conservative solution would be to constrain the future based on a buffer of XX km around the present suitability patches. In addition, I believe that in work evaluating the effect of climate change with species and regions that do not have sufficient data to simulate more realistic dispersal processes (e.g., cellular automaton), it is worthwhile to show different dispersal scenarios reflect potential dispersal realities. Thus establishing a non-dispersal scenario could also be interesting.

L152-157. I also recommend using a threshold-independent metric like Fpb.

L161-167. I read Mendes et al. (2020) and noticed that authors named each constraining approach. Mention the names of constraining method used.

L158-160. How was different GMC processed? Did you perform an ensemble among GMC for a given SSP?

L170-204. “Spatial patterns of beta-diversity, woodiness, and ecological generalism” This analysis was very interesting. Congratulation!

L170. Just use Sørensen index. In addition, these performance values are not valuable because you did not evaluate model transferability. See my comment L149-151.

L206. “We modeled occurrence records of 2,841 Caatinga plant species,” You did not model occurrence records, you modeled species. Correct this sentence

The figures are amazing, nice, and informative. Congratulations.

References

Aranda, S. C., & Lobo, J. M. (2011). How well does presence-only-based species distribution modelling predict assemblage diversity? A case study of the Tenerife flora. Ecography, 34(1), 31–38. https://doi.org/10.1111/j.1600-0587.2010.06134.x
Beauregard, F., & Blois, S. (2014). Beyond a climate-centric view of plant distribution: Edaphic variables add value to distribution models. PLoS ONE, 9(3), e92642. https://doi.org/10.1371/journal.pone.0092642
Bertrand, R., Perez, V., & Gégout, J.-C. (2012). Disregarding the edaphic dimension in species distribution models leads to the omission of crucial spatial information under climate change: The case of Quercus pubescens in France. Global Change Biology, 18(8), 2648–2660. https://doi.org/10.1111/j.1365-2486.2012.02679.x
Bouchet, P. J., Miller, D. L., Roberts, J. J., Mannocci, L., Harris, C. M., & Thomas, L. (2020). dsmextra: Extrapolation assessment tools for density surface models. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13469
Briscoe, N. J., Elith, J., Salguero‐Gómez, R., Lahoz‐Monfort, J. J., Camac, J. S., Giljohann, K. M., Holden, M. H., Hradsky, B. A., Kearney, M. R., McMahon, S. M., Phillips, B. L., Regan, T. J., Rhodes, J. R., Vesk, P. A., Wintle, B. A., Yen, J. D. L., & Guillera‐Arroita, G. (2019). Forecasting species range dynamics with process‐explicit models: Matching methods to applications. Ecology Letters, 22(11), 1940–1956. https://doi.org/10.1111/ele.13348
Di Febbraro, M., D’Amen, M., Raia, P., De Rosa, D., Loy, A., & Guisan, A. (2018). Using macroecological constraints on spatial biodiversity predictions under climate change: The modelling method matters. Ecological Modelling, 390, 79–87. https://doi.org/10.1016/j.ecolmodel.2018.10.023
Dubuis, A., Giovanettina, S., Pellissier, L., Pottier, J., Vittoz, P., & Guisan, A. (2013). Improving the prediction of plant species distribution and community composition by adding edaphic to topo-climatic variables. Journal of Vegetation Science, 24(4), 593–606. https://doi.org/10.1111/jvs.12002
Dubuis, A., Pottier, J., Rion, V., Pellissier, L., Theurillat, J.-P., & Guisan, A. (2011). Predicting spatial patterns of plant species richness: A comparison of direct macroecological and species stacking modelling approaches: Predicting plant species richness. Diversity and Distributions, 17(6), 1122–1131. https://doi.org/10.1111/j.1472-4642.2011.00792.x
Elith, J., Kearney, M., & Phillips, S. (2010). The art of modelling range-shifting species: The art of modelling range-shifting species. Methods in Ecology and Evolution, 1(4), 330–342. https://doi.org/10.1111/j.2041-210X.2010.00036.x
Engler, J. O., & Rödder, D. (2012). Disentangling interpolation and extrapolation uncertainties in ecological niche models: A novel visualization technique for the spatial variation of predictor variable collinearity. Biodiversity Informatics, 8, 30–40.
Feng, X., Enquist, B. J., Park, D. S., Boyle, B., Breshears, D. D., Gallagher, R. V., Lien, A., Newman, E. A., Burger, J. R., Maitner, B. S., Merow, C., Li, Y., Huynh, K. M., Ernst, K., Baldwin, E., Foden, W., Hannah, L., Jørgensen, P. M., Kraft, N. J. B., … López-Hoffman, L. (2022). A review of the heterogeneous landscape of biodiversity databases: Opportunities and challenges for a synthesized biodiversity knowledge base. Global Ecology and Biogeography, n/a(n/a). https://doi.org/10.1111/geb.13497
Grenié, M., Berti, E., Carvajal‐Quintero, J., Dädlow, G. M. L., Sagouis, A., & Winter, M. (2022). Harmonizing taxon names in biodiversity data: A review of tools, databases and best practices. Methods in Ecology and Evolution, 2041-210X.13802. https://doi.org/10.1111/2041-210X.13802
Guisan, A., & Rahbek, C. (2011). SESAM - a new framework integrating macroecological and species distribution models for predicting spatio-temporal patterns of species assemblages: Predicting spatio-temporal patterns of species assemblages. Journal of Biogeography, 38(8), 1433–1444. https://doi.org/10.1111/j.1365-2699.2011.02550.x
Hattab, T., Garzón-López, C. X., Ewald, M., Skowronek, S., Aerts, R., Horen, H., Brasseur, B., Gallet-Moron, E., Spicher, F., Decocq, G., Feilhauer, H., Honnay, O., Kempeneers, P., Schmidtlein, S., Somers, B., Van De Kerchove, R., Rocchini, D., & Lenoir, J. (2017). A unified framework to model the potential and realized distributions of invasive species within the invaded range. Diversity and Distributions, 23(7), 806–819. https://doi.org/10.1111/ddi.12566
Helmstetter, N. A., Conway, C. J., Stevens, B. S., & Goldberg, A. R. (2020). Balancing transferability and complexity of species distribution models for rare species conservation. Diversity and Distributions. https://doi.org/10.1111/ddi.13174
Jin, J., & Yang, J. (2020). BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases. Global Ecology and Conservation, 21, e00852. https://doi.org/10.1016/j.gecco.2019.e00852
Mesgaran, M. B., Cousens, R. D., & Webber, B. L. (2014). Here be dragons: A tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity and Distributions, 20(10), 1147–1159. https://doi.org/10.1111/ddi.12209
Montti, L., Velazco, S. J. E., Travis, J. M. J., & Grau, H. R. (2021). Predicting current and future global distribution of invasive Ligustrum lucidum W.T. Aiton: Assessing emerging risks to biodiversity hotspots. Diversity and Distributions, 27(8), 1568–1583. https://doi.org/10.1111/ddi.13303
Ovaskainen, O., & Abrego, N. (2020). Joint species distribution modelling. Cambridge University Press.
Owens, H. L., Campbell, L. P., Dornak, L. L., Saupe, E. E., Barve, N., Soberón, J., Ingenloff, K., Lira-Noriega, A., Hensz, C. M., Myers, C. E., & Peterson, A. T. (2013). Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecological Modelling, 263, 10–18. https://doi.org/10.1016/j.ecolmodel.2013.04.011
Pollock, L. J., O’Connor, L. M. J., Mokany, K., Rosauer, D. F., Talluto, M. V., & Thuiller, W. (2020). Protecting biodiversity (in all its complexity): New models and methods. Trends in Ecology & Evolution, 35(12), 1119–1128. https://doi.org/10.1016/j.tree.2020.08.015
Pottier, J., Dubuis, A., Pellissier, L., Maiorano, L., Rossier, L., Randin, C. F., Vittoz, P., & Guisan, A. (2013). The accuracy of plant assemblage prediction from species distribution models varies along environmental gradients: Climate and species assembly predictions. Global Ecology and Biogeography, 22(1), 52–63. https://doi.org/10.1111/j.1466-8238.2012.00790.x
Ribeiro, B. R., Velazco, S. J. E., Guidoni‐Martins, K., Tessarolo, G., Jardim, L., Bachman, S. P., & Loyola, R. (2022). bdc: A toolkit for standardizing, integrating and cleaning biodiversity data. Methods in Ecology and Evolution, 2041-210X.13868. https://doi.org/10.1111/2041-210X.13868
Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40, 913–929. https://doi.org/10.1111/ecog.02881
Rodrigues, A. V., Nakamura, G., Staggemeier, V. G., & Duarte, L. (2022). Species misidentification affects biodiversity metrics: Dealing with this issue using the new R package naturaList. Ecological Informatics, 69, 101625. https://doi.org/10.1016/j.ecoinf.2022.101625
Stohlgren, T. J., Jarnevich, C. S., Esaias, W. E., & Morisette, J. T. (2011). Bounding species distribution models. Current Zoology, 57(5), 642–647. https://doi.org/10.1093/czoolo/57.5.642
Thuiller, W. (2013). On the importance of edaphic variables to predict plant species distributions—Limits and prospects. Journal of Vegetation Science, 24(4), 591–592. https://doi.org/10.1111/jvs.12076
Tomlinson, S., Lewandrowski, W., Elliott, C. P., Miller, B. P., & Turner, S. R. (2020). High‐resolution distribution modeling of a threatened short‐range endemic plant informed by edaphic factors. Ecology and Evolution, 10(2), 763–777. https://doi.org/10.1002/ece3.5933
Valavi, R., Elith, J., Lahoz‐Monfort, J. J., & Guillera‐Arroita, G. (2019). BlockCV : An r package for generating spatially or environmentally separated folds for k‐fold cross‐validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232. https://doi.org/10.1111/2041-210X.13107
Velazco, S. J. E., Galvão, F., Villalobos, F., & De Marco Jr, P. (2017). Using worldwide edaphic data to model plant species niches: An assessment at a continental extent. PloS One, 12(10), e0186025. https://doi.org/10.1371/journal.pone.0186025
Wenger, S. J., & Olden, J. D. (2012). Assessing transferability of ecological models: An underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3(2), 260–267. https://doi.org/10.1111/j.2041-210X.2011.00170.x
Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter, C., Edler, D., Farooq, H., Herdean, A., Ariza, M., Scharn, R., Svantesson, S., Wengström, N., Zizka, V., & Antonelli, A. (2019). CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.13152
Zurell, D., Elith, J., & Schröder, B. (2012). Predicting to new environments: Tools for visualizing model behaviour and impacts on mapped distributions: Predicting to new environments. Diversity and Distributions, 18(6), 628–634. https://doi.org/10.1111/j.1472-4642.2012.00887.x
Zurell, D., Zimmermann, N. E., Gross, H., Baltensweiler, A., Sattler, T., & Wüest, R. O. (2020). Testing species assemblage predictions from stacked and joint species distribution models. Journal of Biogeography, 47(1), 101–113. https://doi.org/10.1111/jbi.13608

Source

    © 2022 the Reviewer.

Content of review 2, reviewed on March 16, 2023

I revised the manuscript “Pervasive impacts of climate change on the woodiness and ecological generalism of dry forest plant assemblages”. I am very pleased to read this new version of the manuscript, which is much more improved regarding the clarity of the wording, methodology, and analysis. Thank authors, for replying and addressing all my comments. It would be an important contribution to estimating and understanding the potential effect of climate change on the Caatinga ecoregion. However, I still have concerns about the validation method and spatial restriction of future distribution, which are not suitable for this research that aims to project the model onto future conditions. See my comments below.

Regarding my comments in the last revision (L149-151). Authors replied: “We recognized that the reviewer’s comment is valid. However, the influence of
spatial configuration and number of occurrence records in assigning environmentally
or geographically partitions creates a scenario that is nearly inoperable to iterate
across 3K species. For example, in using one of the simplest geographical partitions
(checkboard blocks), we restrict the number of training presences within each block
to half, which may decrease model performance. To strengthen our approach, we
changed our validation partition from 4-fold to 5-folds and performed the MobilityOriented Parity analysis (MOP) for each species projection (Owens et al., 2013). We
then filtered the estimates of habitat suitability under three different scenarios of
environmental similarity (MOP ≥ 0.7; MOP ≥ 0.8; MOP ≥ 0.9). Please, check lines
201-213.” This is not a justification for not using spatially structured cross-validation for species with a higher number of presences. It is not necessary to be limited to using a single partitioning method. In the same research, you can use different model validation depending on the amount of occurrence data; for example, use repeated k-fold for ensemble of small models, k-fold for species with between 15 and 30 occurrences, and spatial-structured partition (bands or block) for species with >30 occurrences (see, Pimenta et al., 2022). Spatially-structured partitions are necessary for high-standard SDM; especially important when models are projected to other regions or time periods because these partitions approaches evaluate the transferability capacity of models and reduce inflation in performance metrics (Araújo et al., 2019; Santini et al., 2021; Sofaer et al., 2019). In addition, extrapolation metrics (like MOP) do not inform on the models predictive capability; they only measure environmental novelty. Please run models for species with >30 or 50 occurrences with spatially-structured partitions.

About my comment related to delimiting a species’s future distribution.
“Reply: We appreciate the careful thought about our approach, but we believe the
reviewer might have missed the details on spatial constraints applied in our study.
Our initial submission included two different constraints:
1) A definition of species accessible area based on a buffer (around presence
records) with a width size equal to the maximum nearest neighbor distance
among pairs of occurrences (buffer size varied across species). This step was
properly noted by the reviewer.
2) A spatial restriction was applied a posteriori to remove suitable habitats
considered unreachable by each species. Our approach did exactly what the
reviewer’s suggested. First, “suitable & occupied” habitat patches (those overlaid with presence records) were identified, and a buffer of ‘XX km’ was
applied around “suitable & occupied” patches to identify “suitable &
unoccupied but reachable” patches. The ‘XX’ km was defined following the
‘OBR’ method in Mendes et al. (2020), and varied for each species. This
spatial restriction was applied to current and future projections, limiting
therefore species dispersal to suitable and reachable habitat patches.”
From lines 220 to 226. In this and in the previous manuscript version, you mentioned you delimited an accessible area for each species. However, delimiting an accessible area does not imply that this will be the area used for a model projection; it is just the area for model calibration (Cooper & Soberón, 2018); for this reason, you need to be explicit in the text you used an accessible area to calibrate and project current and future distributions. Also, you mentioned you used the OBR method. In ENMTML, the a posteriori constraining methods (like OBR) are applied only to the current distribution. How did you perform OBR approach for future scenarios? The way you described this approach in 220-226 lines is unclear if you are referring to current and future distributions or just current distribution. Constraining methods like OBR is based on the interaction of current calibration data with suitability patches. Therefore, using this method to constrain future suitability patches with current occurrences makes no sense because you have a temporal mismatch between occurrence data and suitability patches. I agree with you about using OBR to constrain current distribution but not future ones.

L130-131. I think this table is not correct. I can not see any information related to species and growth-form. Check this table, please.

L160-162. As far as I know, the unique literature that proposed this approach to define calibration area based on a buffer with a width size equal to the maximum nearest neighbor distance was Andrade et al., (2020). I recommend citing this article here.

Script and code. It would be important for the transparency and reproducibility of your research to provide the codes used in it, maybe in a GitHub or Figshare repository.

References

Andrade, A. F. A., Velazco, S. J. E., & De Marco Jr, P. (2020). ENMTML: An R package for a straightforward construction of complex ecological niche models. Environmental Modelling & Software, 125, 104615. https://doi.org/10.1016/j.envsoft.2019.104615
Araújo, M. B., Anderson, R. P., Márcia Barbosa, A., Beale, C. M., Dormann, C. F., Early, R., Garcia, R. A., Guisan, A., Maiorano, L., Naimi, B., O’Hara, R. B., Zimmermann, N. E., & Rahbek, C. (2019). Standards for distribution models in biodiversity assessments. Science Advances, 5(1), eaat4858. https://doi.org/10.1126/sciadv.aat4858
Cooper, J. C., & Soberón, J. (2018). Creating individual accessible area hypotheses improves stacked species distribution model performance. Global Ecology and Biogeography, 27(1), 156–165. https://doi.org/10.1111/geb.12678
Pimenta, M., Andrade, A. F. A. de, Fernandes, F. H. S., Amboni, M. P. de M., Almeida, R. S., Soares, A. H. S. de B., Falcon, G. B., Raíces, D. S. L., & De Marco Júnior, P. (2022). One size does not fit all: Priority areas for real world problems. Ecological Modelling, 470, 110013. https://doi.org/10.1016/j.ecolmodel.2022.110013
Santini, L., Benítez‐López, A., Maiorano, L., Čengić, M., & Huijbregts, M. A. J. (2021). Assessing the reliability of species distribution projections in climate change research. Diversity and Distributions, ddi.13252. https://doi.org/10.1111/ddi.13252
Sofaer, H. R., Jarnevich, C. S., Pearse, I. S., Smyth, R. L., Auer, S., Cook, G. L., Edwards, T. C., Guala, G. F., Howard, T. G., Morisette, J. T., & Hamilton, H. (2019). Development and delivery of species distribution models to inform decision-making. BioScience, 69(7), 544–557. https://doi.org/10.1093/biosci/biz045

Source

    © 2023 the Reviewer.

References

    R., M. M., O., d. N. F. A., N., P. L., P., S. D., A., S. B. 2023. Pervasive impacts of climate change on the woodiness and ecological generalism of dry forest plant assemblages. Journal of Ecology.