Content of review 1, reviewed on July 16, 2021

In this interesting study, the authors established statistical models to explore the relationship between epistasis and various biophysical features using two fairly comprehensive datasets. Binding and folding epistasis were investigated separately, revealing two different sets of features whose contributions were determined by a model selection procedure. Among other findings, the authors found that (1) the separation of the mutated residues has a negative correlation with both binding and folding epistasis which can be mitigated by charge changes; (2) complex type contributed most to binding epistasis, whereas hydrophobicity contributed the most to folding epistasis but not much, if at all, to binding epistasis. (3) The important features for both binding and folding models were charge, separation distance, and residue size.

Overall, I find this study rather comprehensive, and I especially appreciate the fact that many of their findings were rigorously tested against null hypotheses, e.g., the effect of separation distance potentially being an artifact of sparse data point at long separation distances, or the leave-10%-out procedure to test the robustness of their models. Most of the findings are in accordance with intuition and provides a deeper and more quantitative understanding of the relationship between the features and epistasis. Here are my comments to the authors:

Major comments:
1. The leave-10%-out procedure is a good strategy for validating the models, but I wonder if the authors have considered using bootstrap to resample the dataset. If so, could the authors provide an explanation on why bootstrap is not applicable or leave-10%-out is preferred here? Considering that the 10% data was not used as the testing set for prediction here (whereas in standard leave-p-out cross-validation procedure, the data being left out are used for predication), bootstrap appears to be a more appropriate method for resampling as it can generate resampled datasets with the same size as the original dataset. In addition, the inherent bias towards positive epistasis mentioned in the discussion may be investigated or mitigated by bootstrapping to obtain more balanced datasets (e.g., separate the positive and negative epistasis data, bootstrap them separately, and then combine the two datasets with the same sizes). This may provide a nice addition to the paper to show how the results would change (or not) when the models were to be applied on an unbiased dataset.
2. Could the authors provide the standard errors along with the mean rank values in Table 2? This can provide additional information on the robustness of the models.
3. The authors mentioned in the discussion that the models can only explain a part of explained epistasis and the rest may be explained by dynamics. However, as the authors pointed out, MD simulations are too computationally expensive for the scale of this study and alike. Also, large time-scale motions that often characterize binding and folding are known to be difficult to be obtained by MD simulations. Given these challenges, have the authors consider the applicability of normal mode analysis-based methods such as essential dynamics or elastic network models? Large-scale motions of the proteins called intrinsic dynamics can often be efficiently computed from these methods and provide convenient “predictions” of dynamical properties, such as mean square fluctuations and cross-correlations of residues.

Minor comments:
1. Could the authors please clarify why there is low significance (~0.8) yet high contribution for the separation feature in the binding model? Is it because the feature is not normalized?
2. On page 14, “in the case of binding, when there are changes in size that occur on different protein chains, there is a reduction in epistasis.” Could the authors kindly provide or clarify the data or figure that supports this claim?
3. Page 4, line 7 appears to be missing a comma or a full stop.
4. Page 9, line 3, “it’s iterations”.

Source

    © 2021 the Reviewer.

Content of review 2, reviewed on November 22, 2021

Thank you for the thorough revision and excellent response.

Source

    © 2021 the Reviewer.

References

    E., B. J., R., M. C., Marty, Y. F. 2022. Searching for a mechanistic description of pairwise epistasis in protein systems. Proteins: Structure, Function, and Bioinformatics.