Content of review 1, reviewed on March 07, 2016

My authors have satisfactorily addressed my remarks. I only have two small comments on the new analyses:

1) The comparison with other clustering methods is good addition. The fragility of hierarchical methods and "partitioning around medoid" with respect to missing data is surprising. The authors should make data and scripts available.

2) The legend of new figures 3 and 4 should be clearer. As it stands, one needs to read the main text to understand that "zero, ten, twenty" refers to percentages of missing data.

Level of interest
Please indicate how interesting you found the manuscript:

An article of limited interest

Quality of written English
Please indicate the quality of language in the manuscript:

Acceptable

Declaration of competing interests
Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?
2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?
3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?
4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?
5. Do you have any other financial competing interests?
6. Do you have any non-financial competing interests in relation to this paper?
If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

I declare that I have no competing interests.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0152-3/13742_2016_152_AuthorComment_V2.pdf)


Source

    © 2016 the Reviewer (CC BY 4.0 - source).

Content of review 2, reviewed on August 21, 2016

The manuscript present a method to identify phylogenetically-congruent genes through an agentbased modelling approach originally designed to model bird flocking. The method is concisely presented and applied to a set Staphylococcus aureus genomes known to have evolved via large hybridisation events.

The problem is relevant and the idea has merit, but as I elaborate below, I am concerned that no efforts were done to compare the approach to standard clustering approaches or to existing methods for the same problem. I am also concerned that the authors only provide results for a single dataset. The minimum standard in the field is to validate one's approach on a variety of simulated data, showing that the method performs well under these ideal conditions at least.

Major points:

1. Method only validated on a single problem instance. This is inadequate for a new method. Instead, the authors should at least show on simulated datasets covering a variety of scenarios that the algorithm is able to cluster the data correctly.

2. No comparison with other methods: As the authors correctly point out, their approach boils down to a clustering method. There are many such methods, so why should the proposed approach be preferred? Contrary to the claim two paragraphs prior to the conclusions (please number your ms pages), there are other clustering methods that do not require specifying the number of clusters. Even for those that do, there are heuristics available (elbow, silhouette, etc.). At the very least, it seems that embedding the genes in a space using a standard multidimensional scaling procedure followed by clustering (e.g. using the OPTICS algorithm used by the authors) would provide a reasonable baseline to gauge how useful the flocking approach is.

Minor Points:

3. What genomes were used as input? (accession number/date)

4. How were the orthologous groups computed? 5. How were the single-gene trees computed?

5. How were the single-gene trees computed?

6. Given that orthologous groups were inferred, why did the authors need to map genes to USA300/TCH1516 via profile HMM? In any case, this needs to be described.

7. Paragraph right before conclusions: "The LDs of these genes with respect...". The authors probably mean ILD here. In the context of recombination, LD usually means linkage disequilibrium, which could be confusing.

8. Same sentence: the conjecture that genes that are both in the "core cluster" and hybridisation region could have *reverted back* to the core phylogeny seems highly improbable to me. Assuming these indeed follow the core phylogeny, it seems more likely that they were translocated to that region *after* the hybridisation event.

9. The labels on Fig. 3 are illegible.

Level of interest
Please indicate how interesting you found the manuscript:

An article of limited interest

Quality of written English
Please indicate the quality of language in the manuscript:

Acceptable

Declaration of competing interests
Please complete a declaration of competing interests, considering the following questions:
1. Have you in the past five years received reimbursements, fees, funding, or salary from an
organisation that may in any way gain or lose financially from the publication of this
manuscript, either now or in the future?
2. Do you hold any stocks or shares in an organisation that may in any way gain or lose
financially from the publication of this manuscript, either now or in the future?
3. Do you hold or are you currently applying for any patents relating to the content of the
manuscript?
4. Have you received reimbursements, fees, funding, or salary from an organization that
holds or has applied for patents relating to the content of the manuscript?
5. Do you have any other financial competing interests?
6. Do you have any non-financial competing interests in relation to this paper?
If you can answer no to all of the above, write 'I declare that I have no competing interests'
below. If your reply is yes to any, please give details below.

By way of full disclosure, I am the senior author of a loosely related manuscript submitted to
another journal. However, the two manuscripts use different approaches, and have very different
focuses, so they are not in competition.

I agree to the open peer review policy of the journal. I understand that my name will be included
on my report to the authors and, if the manuscript is accepted for publication, my named report
including any attachments I upload will be posted on the website along with the authors'
responses. I agree for my report to be made available under an Open Access Creative Commons
CC-BY license (http://creativecommons.org/licenses/by/4.0/). I understand that any comments
which I do not wish to be included in my named report can be included as confidential comments
to the editors, which will not be published.

I agree to the open peer review policy of the journal.

Authors' response to reviews: (https://static-content.springer.com/openpeerreview/art%3A10.1186%2Fs13742-016-0152-3/13742_2016_152_AuthorComment_V1.pdf)


Source

    © 2016 the Reviewer (CC BY 4.0 - source).

References

    Apurva, N., Richard, B., Rob, D., Barun, M., Sergios-Orestis, K., Barry, K., J., P. P. 2016. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. GigaScience.