Content of review 1, reviewed on July 11, 2022

The authors compare plant distribution data from the citizen science platform iNaturalist with data from professional plant surveys at the example of Hawaii and assess different bias in citizen science data. As a side story, occurrence records of all vascular plants on Hawaii are evaluated. Assessing the quality of citizen science data in comparison to professional surveys is immensity useful to make citizen science data useful for science and conservation. The manuscript is well written and based on a sound selection of data and literature. However, a clear formulation of hypotheses is missing, and the manuscript would profit enormously from streamlining so that a clear storyline becomes visible. The authors should clarify throughout the entire manuscript if the focus is on plant species in general (as large parts of the analysis are done with vascular plant species data in general) or only on invasive plant species (as indicated by the title). Both, changing the title of the manuscript or focussing only on parts of the data works but there needs to be a clear story throughout the text.

Parts of the methods and the figures need to be slightly adapted to increase the quality of the study (see detailed comments), which will also have implications for the results and discussion section. Please ensure you use terms consistently. Choose either “non-native” or “introduced”. And “observer” or “volunteer”.

One more general remark is that your statement in the methods that you limited your data to Molokaʻi, Lānaʻi, and Kahoʻolawe due to data deficiency is also a result and could be discussed further. Here, limitations of citizen science data might be visible.

Comments on the manuscript are described in detail below:

l. 11: complement: “…discourage the use by scientists and conservationists”. Especially when it comes to invasive plants, conservationists should be mentioned here. Delete “professional” because the wording evokes the impression that “unprofessional scientists” would be the opposite.

l. 14-19: Parts of the methods in the abstract sound like aims of the study. Move sentences such as “we assess… the bias…” to the intro part of the abstract and clarify methods: What data did you use? How did you analyse them?
Additionally, the question arises why now all of a sudden all vascular plant species are analysed while the title evokes the impression that only invasive plants are the focus.

l. 18 & 24: Here you use the term “filtering treatments”. Describe more in detail what these are in your study.

l. 23: When predicting suitable habitat for what? For which species? I guess you mean invasive plant species.

Rephrase to: “HSMs for xy based on iNaturalist and agency datasets often produced distinct projections.” A model and a model projection are two different things.

l. 27: Start the conclusion with a strong sentence and use the “buzz words” from the title. This is a suggestion for a start: “Citizen science data from iNaturalist have the potential to complement professional survey…”

l. 54: Are citizen science data also greater in number?

l. 71: Daru et al. (2018) also mentioned a timely bias in herbarium collections (spring, summer), thus the periods when it is more comfortable to stay outside. This is worth mentioning here. Is such a bias also visible in the citizen science data you use? Obviously, Hawaii does not experience a winter like continental areas at higher latitudes but nevertheless, the climatic conditions at different altitudes may affect sampling bias.

l. 71: This statement might be confusing as you would expect a professional survey about plants to be about plants. How can it be biased towards e.g., vertebrates?

l. 72-75: Here the authors try to make general assumptions about sampling bias. However, the study by Boakes et al., (2010) is about birds (Galliformes). Is the sampling bias the same for plants?

l. 75-77: If several authors have already shown that citizen science data can complement the spatial coverage of expert-based datasets, the question arises what is new about your study? Point this out in your manuscript.

l. 80-83: “Despite the common assumption that expert data are of higher quality than that of citizen science, the difference between volunteer and professional observations requires further evaluation (Lewandowski & Specht, 2015).” Why? Give reasons and thereby strengthen the point why you did this study.

l. 94-96: This sentence misses logic: “Predictions.. are often fitted to survey effort..”? “Prediction … may inaccurately estimate species ranges”? Predictions are not fitted to survey effort and predictions also do not estimate species ranges. Please rephrase.

l. 97: Sampling bias can partly be addressed. However, what cannot be solved by any filtering is the missing of occurrences from certain regions. Please mention this here.

l. 102-107: in l. 102-105 you describe that multiple datasets have different bias. Your conclusion (thus…) in the next sentence is that “Modeling approaches that combine multiple datasets have thus become increasingly common”. I miss the logic. Why would having different bias in your datasets lead to an increasing number of studies with multiple datasets?

l. 108: Here the word “invasive” is mentioned the first time in the introduction. If the study is supposed to focus on invasions (see title) than this group of species needs to be addressed earlier in the introduction.

l. 125: Why do you use iNaturalist as an example for a citizen science platform? Give reasons.

l. 125-137: Make sure you clearly state your aim and formulate hypotheses: This could be a promising start: “We aim to… at the example of iNaturalist data on the Hawaiian Islands. We hypothesize that…i)…ii)…

l. 166: You mention that you included data on species origin and conservation status. Other than “native” or “non-native” to Hawaii you do not have more information on the origin of the species. Furthermore, I do not see how conservation status is included. Therefore, information like “threatened”, “critically endangered” etc. would need to be included but I do not see such data in you study.

l. 215: First state that you build HSMs on different sources and then you can go on talking about single-source and multi-source models.

l. 218: How do you justify the number of pseudo-absences used? Why do you use less for professional survey data than for iNaturalist data? The number of pseudo-absences should be based on the number of occurrences for each species you are building a HSM for. This paper helps to select the correct number:

Barbet-Massin, M., Jiguet, F., Albert, C.H. and Thuiller, W. (2012), Selecting pseudo-absences for species distribution models: how, where and how many?. Methods in Ecology and Evolution, 3: 327-338. https://doi.org/10.1111/j.2041-210X.2011.00172.x

l. 223-234: Why did you choose these environmental variables for modelling? A selection based on plant ecology is more useful and common practice.

l. 242: Name the seven different models in the text. Why these?

l. 257: How did you assess model quality? This is not clear from the methods section. Comparing predictions and assessing model fit and quality is something fundamentally different! Make sure you do both and do not mix these up.

l. 272: Why is R mentioned here? Please state either at the beginning or the end of the methods-analysis section that you made your analysis in R.

l. 290-291: Put (n = xy) behind the % values. How many introduced species are on Hawaii? How many recorded in iNaturalist?

l. 379: Never start the discussion with a subheading. Start the paragraph with a summary of your most important findings and then go into detail and discuss your findings and put them into the context of literature one by one.

l. 388: Here, you discuss threatened species, although I do not see any results on these species. Where do these numbers come from? Make sure, if you included threatened/endangered… taxa in your study to include this properly in the methods and results.

l. 400: What is the meaning of “this familiarity may be as motivating as rarity for other taxa”?

l.416: “Biased sampling data is still useful for HSM” is not a good subheading, but a sentence. Please rephrase.

l. 508: “we did not seek to identify the “best” mode…” is not a scientifically sound sentence. Please rephrase.

l. 526: Be careful with the wording: “…each data source appeared to sample distinct environments…”. A data source does not sample. How about “each data source represents distinct environments differently”

l. 533: This last sentence should be phrased more carefully: “In addition to professional surveys we recommend the informed use of citizen science data.” Or something like that.

Tables
Table 1 and 2: The table headers need to better describe the content of the tables. Why were arrows used in table 1?

Figures
In general, different font sizes, bold/no-bold and styles of axes labelling were used. Please unify, use the same font size and avoid bold letters.
Figure 1 and 2: Use “Plant species [%]” on the y-axes.
Figure 3 and 4: Use “Records [%]” on the y-axis.
Figure 6: On y-axis: “Suitable area [%]“.
Figure 7: Hardly readable. Increase font size and consider slitting lines and enlarge the figure.

Appendix/Supplementary Information
First of all decide if you mean appendix or supplementary information If you display figures in that section please number them as “Figure A1” or “Figure S1”.

Minor remarks:
l.49: “species occurrence data”
l.49: “The collection...”
l.88: Here you abbreviate “habitat suitability modeling” with HSM. But actually, you use HSM as a surrogate for “habitat suitability model”. Please clarify and use abbreviations consistently.
l.122: wild-fires (plural)
l.141 & 151: No third-level title necessary. Just use the title “Species occurrence data” and then present iNaturalist and professional survey data in two separate paragraphs.
l. 146: Why is ”GBIF.org, 2020” the reference, and not “GBIF, 2020”?
l. 146: Decide to put the information in “Appendix” or in “Supplementary Information”. This is not the same.
l. 159: m a.s.l.?
l.169: mention the five categories in the text.
l. 193: “A 250 m x 250 m grid…” Please correct throughout the text.
l. 206: Why is 1.64 the threshold?
l. 313: …for the four studied invasive species…
l. 523: “… that also affect…”

Source

    © 2022 the Reviewer.

Content of review 2, reviewed on January 19, 2023

As proposed in the first round of reviews, the formulation of a clear research aim and hypothesis, and subsequently consistent storyline would have increased the quality of the manuscript massively. In the new version of the manuscript, I feel like this has not been achieved. Here is my explanation: In the last paragraph of the introduction the sentence “we explored the potential of low-structure citizen science as a source of invasive plant monitoring and habitat suitability modeling data” is provided as a potential aim. Thereafter, two main components of the study are presented, followed by three research questions. The following sentence again provokes the impression of naming the aim of the study (“we examine..:”), followed by one hypothesis (“we hypothesize”…) and one expectation (“we expect..:”). Thus, the end of the introduction still does not provide a clear research aim or clear hypothesis and I would strongly recommend editing this section.
Despite numerous comments by both reviewers and the editor about the methodology used, questions about the reliability of the results remain. For example, I commented on the chosen number of pseudo-absences for habitat suitability modeling. As reference to justify a number of 10,000 pseudo-absences chosen in this study, Barbet-Massin et al. (2012) was cited. However, in the reference it is stated in the abstract that “Models fitted with a large number of pseudo-absences but equally weighted to the presences (i.e. the weighted sum of presence equals the weighted sum of pseudo-absence) produced the most accurate predicted distributions.” This underlines my concern about the chosen number of 10,000 pseudo-absences.
I would recommend to be more precise in answering to reviewer comments to make clearer what exactly you changed where in the manuscript and why. The use of line numbers and detailed explanations would help. Based on my assessment of your manuscript it still does not meet the quality standards of Diversity and Distributions.

Source

    © 2023 the Reviewer.

References

    Monica, D., Lucas, B. F., W., T. M. W., W., G. T. W. 2023. Citizen science can complement professional invasive plant surveys and improve estimates of suitable habitat. Diversity and Distributions.