Content of review 1, reviewed on March 02, 2022

The authors evaluate several "large data" classification methods on switching movement behaviors with predictor covariates in a bald eagle dataset.

  1. Computational speeds of various methods are compared in the Introduction, but their computational complexities aren't provided. Do all of these methods have computational costs that are linear in the amount of data? Or are some worse than linear? I ask because kNN is described as slow, but kNN has a computational complexity that is linear in the amount of data, and linear complexity is usually the best one can achieve without subsetting the data, though I do appreciate that the proportionality constants (computational-time per data) can be very different.

If the different methods have different computational complexities, then I think it would be additionally useful to have a computation time comparison for subsets of the data, to see how computation time scales with the amount of data. Otherwise, the current table should be sufficient.

  1. I couldn't follow the material under "Dichotomous response". 'y' is a boolean variable, and naturally I would assume that y-hat is an estimator of y. But the estimator y-hat is fed into the inverse-logit function, and so I would also assume that y-hat is a real variable between -Inf and +Inf. Both can't be true. From the remaining context, I take it that y-hat is not an estimator of 'y', but of logit(y)?, and, therefore, y-hat should have a different name. I take it that there should be some variable like z=logit(y) and y-hat should be named z-hat instead of y-hat?

  2. Since this was a comparison of methods on one dataset, I was left desiring a plot of how the various methods performed versus the amount of data. All of the training times were relatively low and the total amount of data was fairly large, so it seems like it would be fairly easy to subset the data a few times and replicate the same analysis, but with less data.

I think there is a missing minus sign is missing on line 130?

"Data were collected at short time intervals" is vague.

I was a confused that behavioral states were classified via k-means clustering, but then "month" was included as a predictor to account for variation in flight behavior.

Source

    © 2022 the Reviewer.

Content of review 2, reviewed on July 02, 2022

I have no further comments.

Source

    © 2022 the Reviewer.

References

    Silas, B., M., H. M., E., D. A., A., B. M., Sara, S., A., M. T., E., K. T. 2023. A review of supervised learning methods for classifying animal behavioural states from environmental features. Methods in Ecology and Evolution.