Content of review 1, reviewed on April 27, 2025

This study investigates the inter-epidemic risk of Rift Valley Fever (RVF) in Kenya, Tanzania, and Uganda using machine learning models, with validation based on human serological data. In addition to the current risk assessment, the study projects future RVF risk under three climate change scenarios—SSP126, SSP245, and SSP370—across three time periods: 2021–2040, 2041–2060, and 2061–2080.

The research is clearly and logically described, with a well-structured and methodologically sound approach. The writing is clear and easy to follow, and the figures are professionally prepared and enhance the understanding of the results.

One of the major findings is that goat density emerges as the single most important predictor of inter-epidemic RVF risk. This finding suggests that goat populations could serve as valuable sentinels for RVF surveillance. Additionally, the study highlights that climate change is likely to significantly reshape both the spatial and temporal distribution of RVF risk in the region.

While the study is well-executed overall, I have several questions and suggestions for clarification:

First, concerning the handling of the dataset, the authors note that there were 100 outbreak points (positive samples) and 27,000 background points (negative samples) generated based on population distribution. This results in a highly imbalanced dataset. Could the authors clarify how this imbalance was addressed during model training? Although XGBoost has built-in mechanisms that can handle imbalanced datasets, such as scale_pos_weight or class weighting, it would be helpful if the authors explicitly stated whether they used any of these options and provided the relevant settings.

Second, regarding the data split for training and testing, the outbreaks were divided into a 70:30 ratio, but the background points were split into 19,800 for training and 7,200 for testing. This proportion does not match the 70:30 split exactly, and it raises some confusion. Was this discrepancy intentional? Furthermore, is the number 19,800 a typographical error (should it have been 18,900)? Clarification on this point would enhance the transparency of the methodology.

Third, with respect to validation using serological assays, I found the explanation regarding the "103 grid cells" referenced in Figure 3 and in lines 139–141 somewhat unclear. What exactly do the 103 grid cells represent? How do they relate to the 406 grid cells mentioned in line 139 and the 124,313 grid cells mentioned in line 105? A clearer explanation of how these different sets of grid cells are defined and used would be helpful. In addition, the fitted line shown in Figure 3 does not appear to be well-fitted to the data points. Could the authors elaborate on the parameters listed in lines 215–216 (β, SE, and p-value)? Specifically, could the authors clarify the purpose of reporting β, SE, and p-value? Are these parameters intended to demonstrate that the fitted slope is significantly different from zero based on the low p-value?

Another point concerns the choice of climate change scenarios. The study includes SSP126, SSP245, and SSP370 but omits SSP585, which is often considered as a high-end emissions scenario. Could the authors explain why SSP585 was not included in the analysis?

Finally, I have a few minor suggestions for improving the figures and text presentation:

  • Several figures, such as Figure 1, lack important cartographic elements like north arrows, scale bars, and color legends. Including these elements would make the otherwise professionally prepared figures more complete and easier for readers to interpret.

  • Line 11 contains a typographical error: "208" should be corrected to "2080."

In conclusion, this is a well-conducted and valuable study that addresses an important public health issue under current and future climate conditions. Addressing the points raised above would further strengthen the clarity and robustness of the manuscript.

Source

    © 2025 the Reviewer.